Pareta SDK
pareta is the official client for Pareta, available for Python and TypeScript/JavaScript — same API, same pareta_sk_ key, same four things:
- Deploys open-weights models as live endpoints. You name a task and a model; Pareta picks the GPU and serving config. There is no hardware knob.
- Serves metered OpenAI-compatible inference. A deployed endpoint speaks the OpenAI chat-completions wire format, so this SDK and the stock
openaiclient are interchangeable against it. - Evaluates models on your own data. Score open candidates and frontier baselines on your rows, then read per-model quality and cost.
- Browses the benchmark catalog. Match a sentence to a task, read its leaderboard, and find the model worth deploying.
A few platform truths shape the whole API:
- GPUs are hidden.
endpoints.deploy()takes a task and a model, never hardware. - Models are per-task aliases. Open-weights ids are masked to public aliases like
qwen-vl-2. Real ids never cross the SDK boundary. Frontier (vendor) ids are in the clear. - Inference and evals are metered against your org balance. A successful call debits credit. An empty balance raises
InsufficientCreditsError(402). An eval run reports its billed total onrun.cost(dollars). Top-up is browser-only; the SDK never touches billing.
Python or TypeScript? Both clients are at full parity. The one design difference: Python ships sync (
Pareta) and async (AsyncPareta) clients; TypeScript has a single Promise-onlyPareta(every method isasync). Code samples throughout these docs show Python and TypeScript side by side.
Install
Python
pip install pareta # or: uv add pareta / poetry add pareta
TypeScript
npm install pareta # or: pnpm add pareta / yarn add pareta / bun add pareta
Hello world
Mint a pareta_sk_ key in the dashboard, export it as PARETA_API_KEY, then deploy and call a model:
Python
from pareta import Pareta
pa = Pareta.from_env() # reads PARETA_API_KEY
ep = pa.endpoints.deploy(task="contract-key-fields", model="recommended", wait=True)
resp = pa.chat.completions.create(
model=ep.id, # the endpoint id
messages=[{"role": "user", "content": "Say hello in one sentence."}],
)
print(resp.choices[0].message.content)
TypeScript
import { Pareta } from "pareta";
const pa = Pareta.fromEnv(); // reads PARETA_API_KEY
const ep = await pa.endpoints.deploy({ task: "contract-key-fields", model: "recommended", wait: true });
const resp = await pa.chat.completions.create({
model: ep.id, // the endpoint id
messages: [{ role: "user", content: "Say hello in one sentence." }],
});
console.log(resp.choices[0].message.content);
Guide
Start-to-finish, in reading order — every page shows Python and TypeScript. See the guide index.
- Installation & authentication — install
pareta(pip or npm), authenticate with apareta_sk_key, make a first metered call. - Quickstart — deploy the recommended model and run inference end to end in about a dozen lines.
- Core concepts — tasks, open vs frontier models, per-task aliases, hidden hardware, metering, and the match to leaderboard to eval to deploy funnel.
- Running inference —
chat.completions.create, streaming, passthrough params,models.list, and metering errors. - Deploying & operating endpoints —
deploywait semantics, lifecycle, andmetrics. - Finding the right model — match intent, rank with
leaderboard/recommended, list frontier baselines. - Evaluating on your own data —
evals.setsandevals.runs, per-model quality/CIs/cost, and the metered run total. - Errors, retries & timeouts — the
ParetaErrorhierarchy, which errors to catch, and the retry policy. - Async & concurrency — Python's
AsyncParetavs TypeScript's Promise-only client, and fanning out concurrent calls. - Configuration — API key, base URL, timeouts, retries, and injecting a custom HTTP client.
Examples
Copy-paste workflows for real jobs, in both languages. See the examples index.
- Deploy a model and call it — the two-call deploy-then-infer workflow.
- From a sentence to a deployed winner — the full match to eval to deploy funnel.
- Benchmark models on your own data — eval open candidates against frontier baselines and read
run.cost. - Document extraction (PDF/image) — the blob-task loop: upload documents, eval, deploy, infer.
- Streaming chat completions — iterate chat chunks and accumulate text.
- Concurrent calls — fan out inference and eval calls (
asyncio.gather/Promise.all). - Cost & quality monitoring — read what calls cost and watch a live endpoint with
endpoints.metrics(). - Migrating from the OpenAI SDK — keep using
openaiagainst Pareta, and when to switch topareta.
Reference
Field-by-field API docs. Signatures are shown in Python; the TypeScript API mirrors them (camelCase names, options objects, awaited) — see any guide page for the TS form. See the reference index.
- Client —
Pareta(and Python'sAsyncPareta):from_env/fromEnv, constructor params, lifecycle, and the five resource namespaces. - chat.completions —
chat.completions.create, return types, streaming, and the error surface. - models —
models.list()and theModelfields. - endpoints —
deploy/list/retrieve/start/stop/delete, theEndpointobject, andmetrics(id). - tasks —
list/retrieve/match/leaderboard/recommendedand their response models. - evals —
evals.sets,evals.runs, andevals.frontierModels. - Exceptions — the
ParetaErrorhierarchy and status-to-class mapping. - Response types — every response object plus the
.costvs.costMicroUsdmoney convention. - Underlying HTTP API — the
/v1routes the SDK wraps (language-neutral).