Skip to main content

Errors, retries & timeouts

Every failure the SDK can raise is a subclass of ParetaError, so one except clause catches everything, and a more specific clause catches exactly the case you care about. The client also retries transient failures for you (network blips, 429s, 5xx) with exponential backoff before giving up. This page is the map: which exception means what, what is retried automatically, and how to tune the timeout and retry budget.

Import the exceptions straight from the package:

Python

from pareta import (
Pareta,
ParetaError, # base class for everything below
APIConnectionError, # never reached the server (DNS/TCP/TLS)
APITimeoutError, # subclass of APIConnectionError
APIStatusError, # any non-2xx from the server
BadRequestError, # 400, 422
AuthenticationError, # 401
PermissionDeniedError, # 403
InsufficientCreditsError, # 402 — org out of balance
NotFoundError, # 404
ConflictError, # 409
RateLimitError, # 429
EndpointNotReadyError, # 503 — endpoint stopped/cold/provider down
)

TypeScript

import {
Pareta,
ParetaError, // base class for everything below
APIConnectionError, // never reached the server (DNS/TCP/TLS)
APITimeoutError, // subclass of APIConnectionError
APIStatusError, // any non-2xx from the server
BadRequestError, // 400, 422
AuthenticationError, // 401
PermissionDeniedError, // 403
InsufficientCreditsError, // 402 — org out of balance
NotFoundError, // 404
ConflictError, // 409
RateLimitError, // 429
EndpointNotReadyError, // 503 — endpoint stopped/cold/provider down
} from "pareta";

The hierarchy

ParetaError
├── APIConnectionError request never reached the server
│ └── APITimeoutError timed out before any response
└── APIStatusError server returned a non-2xx status
├── BadRequestError 400, 422
├── AuthenticationError 401
├── InsufficientCreditsError 402
├── PermissionDeniedError 403
├── NotFoundError 404
├── ConflictError 409
├── RateLimitError 429
└── EndpointNotReadyError 503

ParetaError is also raised directly (not as an APIStatusError) in two non-HTTP cases: constructing a client with no API key, and an evals.runs.wait() poll loop that exceeds its timeout. See Timeouts below.

Status code to exception

The server is FastAPI, so error bodies are {"detail": "<message>"} with an HTTP status. The SDK maps the status to the most specific subclass so you catch by meaning, not by sniffing integers.

StatusExceptionWhat it means
400, 422BadRequestErrorRequest validation failed (bad params, malformed body)
401AuthenticationErrorAPI key missing or invalid
402InsufficientCreditsErrorOrg is out of balance; top up in the dashboard
403PermissionDeniedErrorAuthenticated, but not allowed to do this
404NotFoundErrorEndpoint / eval set / run / task id does not exist
409ConflictErrorConflict (seed endpoint, transient lock/contention)
429RateLimitErrorRate limited; honor Retry-After
503EndpointNotReadyErrorTarget endpoint is stopped, cold-starting, or its provider is down
other 5xxAPIStatusErrorGeneric server error

Reading an APIStatusError

Every APIStatusError carries the fields you need to log and debug. request_id comes from the x-request-id response header and is the fastest thing to quote in a support thread.

Python

from pareta import Pareta, APIStatusError

with Pareta.from_env() as pa:
try:
pa.endpoints.retrieve("ep_does_not_exist")
except APIStatusError as e:
print(e.status_code) # 404
print(e.detail) # server's `detail` string (or raw body)
print(e.request_id) # "req_…" — quote this in bug reports
print(e.response) # the underlying httpx.Response, for advanced use

TypeScript

import { Pareta, APIStatusError } from "pareta";

const pa = Pareta.fromEnv();
try {
await pa.endpoints.retrieve("ep_does_not_exist");
} catch (e) {
if (e instanceof APIStatusError) {
console.log(e.status); // 404
console.log(e.detail); // server's `detail` string (or raw body)
console.log(e.requestId); // "req_…" — quote this in bug reports
console.log(e.response); // the underlying fetch Response, for advanced use
}
}

str(e) is the server's detail message when present, otherwise HTTP <code>.

The errors worth catching

Most code only needs to handle a handful of these explicitly. The rest are fine to let bubble up to a top-level except ParetaError.

InsufficientCreditsError (402) — out of balance

Both inference and evals are metered against your org's balance. A successful chat.completions.create() debits the balance; an evals.runs.create() debits for the open and frontier compute it runs. When the balance can't cover the call, you get a 402. Top-up is browser-only — the SDK exposes no balance or payment surface — so the right move is to surface a clear message pointing at the dashboard.

Python

from pareta import Pareta, InsufficientCreditsError

with Pareta.from_env() as pa:
try:
resp = pa.chat.completions.create(
model="ep_contract_kie",
messages=[{"role": "user", "content": "Extract the parties."}],
)
except InsufficientCreditsError:
raise SystemExit("Org balance is empty. Top up at https://pareta.ai dashboard.")

TypeScript

import { Pareta, InsufficientCreditsError } from "pareta";

const pa = Pareta.fromEnv();
try {
const resp = await pa.chat.completions.create({
model: "ep_contract_kie",
messages: [{ role: "user", content: "Extract the parties." }],
});
} catch (e) {
if (e instanceof InsufficientCreditsError) {
throw new Error("Org balance is empty. Top up at https://pareta.ai dashboard.");
}
throw e;
}

NotFoundError (404) — wrong id

A stale or mistyped endpoint id, eval set id, run id, or task id.

Python

from pareta import Pareta, NotFoundError

with Pareta.from_env() as pa:
try:
ep = pa.endpoints.retrieve("ep_maybe_deleted")
except NotFoundError:
ep = pa.endpoints.deploy(task="contract-key-fields", wait=True) # redeploy

TypeScript

import { Pareta, NotFoundError } from "pareta";

const pa = Pareta.fromEnv();
let ep;
try {
ep = await pa.endpoints.retrieve("ep_maybe_deleted");
} catch (e) {
if (e instanceof NotFoundError) {
ep = await pa.endpoints.deploy({ task: "contract-key-fields", wait: true }); // redeploy
} else {
throw e;
}
}

EndpointNotReadyError (503) — endpoint not serving

The endpoint exists but isn't serving yet: it's stopped, cold-starting, or the provider is briefly unavailable. The SDK already retries 503 a couple of times (see Automatic retries); if it still surfaces, start the endpoint and retry your call. Remember that hardware is fully managed — start() takes no GPU knob, just the endpoint id.

Python

from pareta import Pareta, EndpointNotReadyError

with Pareta.from_env() as pa:
try:
resp = pa.chat.completions.create(
model="ep_contract_kie",
messages=[{"role": "user", "content": "ping"}],
)
except EndpointNotReadyError:
pa.endpoints.start("ep_contract_kie") # warm it back up
ep = pa.endpoints.retrieve("ep_contract_kie") # poll until ep.is_live, then retry

TypeScript

import { Pareta, EndpointNotReadyError } from "pareta";

const pa = Pareta.fromEnv();
try {
const resp = await pa.chat.completions.create({
model: "ep_contract_kie",
messages: [{ role: "user", content: "ping" }],
});
} catch (e) {
if (e instanceof EndpointNotReadyError) {
await pa.endpoints.start("ep_contract_kie"); // warm it back up
const ep = await pa.endpoints.retrieve("ep_contract_kie"); // poll until ep.isLive, then retry
} else {
throw e;
}
}

RateLimitError (429) — slow down

Already retried automatically, honoring the server's Retry-After. You only see it after the retry budget is exhausted. Back off and try again later.

Python

from pareta import Pareta, RateLimitError

with Pareta.from_env() as pa:
try:
pa.chat.completions.create(
model="ep_contract_kie",
messages=[{"role": "user", "content": "hi"}],
)
except RateLimitError as e:
print(f"Still rate limited after retries (request {e.request_id}); back off.")

TypeScript

import { Pareta, RateLimitError } from "pareta";

const pa = Pareta.fromEnv();
try {
await pa.chat.completions.create({
model: "ep_contract_kie",
messages: [{ role: "user", content: "hi" }],
});
} catch (e) {
if (e instanceof RateLimitError) {
console.log(`Still rate limited after retries (request ${e.requestId}); back off.`);
} else {
throw e;
}
}

AuthenticationError (401) vs missing key

A 401 means the key reached the server and was rejected (wrong or revoked). That is distinct from constructing a client with no key at all, which fails fast client-side with a plain ParetaError before any request goes out:

Python

import pareta

try:
pa = pareta.Pareta(api_key="") # or PARETA_API_KEY unset with from_env()
except pareta.ParetaError as e:
print(e) # "missing API key. Pass api_key=… or set PARETA_API_KEY …"

TypeScript

import { Pareta, ParetaError } from "pareta";

try {
const pa = new Pareta({ apiKey: "" }); // or PARETA_API_KEY unset with Pareta.fromEnv()
} catch (e) {
if (e instanceof ParetaError) {
console.log(e.message); // "missing API key. Pass apiKey: … or use Pareta.fromEnv() …"
}
}

Pre-flight ValueError / TypeError

Some mistakes never become an HTTP call. The SDK validates the obvious ones up front and raises the standard Python exception — not a ParetaError — because they are programming errors, not server responses:

These are fine to let crash in development; they signal a bug in the call, not a runtime condition to recover from.

Automatic retries

The client retries transient failures for you before raising. You usually do not need a retry loop of your own.

What is retried: status codes 408, 409, 429, 500, 502, 503, 504, plus connection-level errors that happen between attempts. The default budget is max_retries=2 (so up to three attempts total).

Backoff: if the server sent a Retry-After header, the SDK waits that many seconds (capped at 30s). Otherwise it uses exponential backoff with jitter: min(0.5 * 2**attempt, 8.0) + random(0, 0.25) seconds, so roughly 0.5s, then 1s, capped at 8s.

What is not retried: stable 4xx (400, 401, 402, 403, 404, 422) raise immediately — retrying a bad request or an empty balance won't help. Connection errors on the very first attempt are surfaced as APIConnectionError / APITimeoutError once the budget is exhausted.

Tune the budget per client. Set max_retries=0 to disable retries entirely:

Python

from pareta import Pareta

# More aggressive: up to 6 attempts on transient failures.
pa = Pareta.from_env(max_retries=5)

# No retries — fail fast and handle it yourself.
strict = Pareta.from_env(max_retries=0)

TypeScript

import { Pareta } from "pareta";

// More aggressive: up to 6 attempts on transient failures.
const pa = Pareta.fromEnv({ maxRetries: 5 });

// No retries — fail fast and handle it yourself.
const strict = Pareta.fromEnv({ maxRetries: 0 });

Streaming and retries

Retries apply only to the initial handshake (connect and status line). Once SSE bytes are flowing — token chunks from a streamed chat completion, or progress/complete events from a streamed deploy — a mid-stream drop raises immediately, because the stream cannot be safely resumed. Catch it and restart the request from the top if you need to.

Python

from pareta import Pareta, APIConnectionError

with Pareta.from_env() as pa:
try:
for chunk in pa.chat.completions.create(
model="ep_contract_kie",
messages=[{"role": "user", "content": "Summarize the contract."}],
stream=True,
):
piece = chunk.choices[0].delta.content
if piece:
print(piece, end="", flush=True)
except APIConnectionError:
print("\n[stream dropped — re-issue the request to retry]")

TypeScript

import { Pareta, APIConnectionError } from "pareta";

const pa = Pareta.fromEnv();
try {
const stream = pa.chat.completions.create({
model: "ep_contract_kie",
messages: [{ role: "user", content: "Summarize the contract." }],
stream: true,
});
for await (const chunk of stream) {
const piece = chunk.choices[0].delta.content;
if (piece) process.stdout.write(piece);
}
} catch (e) {
if (e instanceof APIConnectionError) {
console.log("\n[stream dropped — re-issue the request to retry]");
} else {
throw e;
}
}

Timeouts

The default per-request timeout is httpx.Timeout(60.0, connect=10.0): 60s overall, 10s to establish the connection. A request that exceeds it raises APITimeoutError (a subclass of APIConnectionError) after the retry budget is spent. Override it with any httpx.Timeout (or a bare float):

Python

import httpx
from pareta import Pareta, APITimeoutError

# 120s overall, 5s to connect — handy for long generations.
pa = Pareta.from_env(timeout=httpx.Timeout(120.0, connect=5.0))

with pa:
try:
pa.chat.completions.create(
model="ep_contract_kie",
messages=[{"role": "user", "content": "Write a long summary."}],
max_tokens=4096,
)
except APITimeoutError:
print("Request timed out; consider streaming or a larger timeout.")

TypeScript

import { Pareta, APITimeoutError } from "pareta";

// 120s overall (one budget — there's no separate connect timeout in TS).
const pa = Pareta.fromEnv({ timeout: 120_000 });

try {
await pa.chat.completions.create({
model: "ep_contract_kie",
messages: [{ role: "user", content: "Write a long summary." }],
max_tokens: 4096,
});
} catch (e) {
if (e instanceof APITimeoutError) {
console.log("Request timed out; consider streaming or a larger timeout.");
} else {
throw e;
}
}

Eval-run wait timeout

evals.runs.create(wait=True) and evals.runs.wait() are different: they poll the run to completion. The timeout parameter there bounds the whole poll loop (default 900s), not a single HTTP request. If the run hasn't reached a terminal status (completed or failed) by the deadline, the poll helper raises a plain ParetaError — the run keeps going server-side, so you can re-retrieve() it later by id.

Python

from pareta import Pareta, ParetaError

with Pareta.from_env() as pa:
try:
run = pa.evals.runs.create(
task="contract-key-fields",
items=[{"input": "...", "expected": "..."}],
models=["contract-1", "contract-2"],
frontier="benchmarked",
wait=True,
timeout=600.0, # give up waiting after 10 minutes
poll_interval=5.0,
)
print(run.status, run.cost) # e.g. "completed" Decimal("0.42")
except ParetaError as e:
print(e) # "eval run … did not finish within 600s" — poll later with runs.retrieve(id)

TypeScript

import { Pareta, ParetaError } from "pareta";

const pa = Pareta.fromEnv();
try {
const run = await pa.evals.runs.create({
task: "contract-key-fields",
items: [{ input: "...", expected: "..." }],
models: ["contract-1", "contract-2"],
frontier: "benchmarked",
wait: true,
timeout: 600, // give up waiting after 10 minutes
pollInterval: 5,
});
console.log(run.status, run.cost); // e.g. "completed" "0.42"
} catch (e) {
if (e instanceof ParetaError) {
console.log(e.message); // "eval run … did not finish within 600s" — poll later with runs.retrieve(id)
} else {
throw e;
}
}

Note that a run finishing with status == "failed" is not an exception — it's a terminal state you read off the returned EvalRun (run.is_terminal is True, run.error_detail carries the message). Only the wait timeout raises.

Async

AsyncPareta raises the exact same exception classes; wrap await calls in the same try/except. Retries, backoff, and timeouts behave identically — backoff just uses asyncio.sleep under the hood.

Python

import asyncio
from pareta import AsyncPareta, InsufficientCreditsError, EndpointNotReadyError

async def main():
async with AsyncPareta.from_env() as pa:
try:
resp = await pa.chat.completions.create(
model="ep_contract_kie",
messages=[{"role": "user", "content": "Extract the parties."}],
)
print(resp.choices[0].message.content)
except InsufficientCreditsError:
print("Top up your org balance in the dashboard.")
except EndpointNotReadyError:
await pa.endpoints.start("ep_contract_kie")

asyncio.run(main())

TypeScript

// There is no AsyncPareta in TypeScript — the single `Pareta` client is already
// async: every I/O method returns a Promise, so you just `await` it. The same
// exception classes, retries, backoff, and timeouts apply unchanged.
import { Pareta, InsufficientCreditsError, EndpointNotReadyError } from "pareta";

async function main() {
const pa = Pareta.fromEnv();
try {
const resp = await pa.chat.completions.create({
model: "ep_contract_kie",
messages: [{ role: "user", content: "Extract the parties." }],
});
console.log(resp.choices[0].message.content);
} catch (e) {
if (e instanceof InsufficientCreditsError) {
console.log("Top up your org balance in the dashboard.");
} else if (e instanceof EndpointNotReadyError) {
await pa.endpoints.start("ep_contract_kie");
} else {
throw e;
}
}
}

main();

A layered handler

A practical pattern: catch the few cases you can act on, then fall back to the base class so nothing escapes unhandled.

Python

from pareta import (
Pareta,
InsufficientCreditsError,
EndpointNotReadyError,
RateLimitError,
APITimeoutError,
ParetaError,
)

with Pareta.from_env() as pa:
try:
resp = pa.chat.completions.create(
model="ep_contract_kie",
messages=[{"role": "user", "content": "Extract the parties."}],
)
print(resp.choices[0].message.content)
except InsufficientCreditsError:
print("Out of balance — top up in the dashboard.")
except EndpointNotReadyError:
pa.endpoints.start("ep_contract_kie") # warm it, then retry
except RateLimitError:
print("Rate limited after retries — back off and try again.")
except APITimeoutError:
print("Timed out — raise the timeout or stream the response.")
except ParetaError as e:
print(f"Unexpected SDK error: {e}") # request_id is on APIStatusError subclasses

TypeScript

import {
Pareta,
InsufficientCreditsError,
EndpointNotReadyError,
RateLimitError,
APITimeoutError,
ParetaError,
} from "pareta";

const pa = Pareta.fromEnv();
try {
const resp = await pa.chat.completions.create({
model: "ep_contract_kie",
messages: [{ role: "user", content: "Extract the parties." }],
});
console.log(resp.choices[0].message.content);
} catch (e) {
if (e instanceof InsufficientCreditsError) {
console.log("Out of balance — top up in the dashboard.");
} else if (e instanceof EndpointNotReadyError) {
await pa.endpoints.start("ep_contract_kie"); // warm it, then retry
} else if (e instanceof RateLimitError) {
console.log("Rate limited after retries — back off and try again.");
} else if (e instanceof APITimeoutError) {
console.log("Timed out — raise the timeout or stream the response.");
} else if (e instanceof ParetaError) {
console.log(`Unexpected SDK error: ${e.message}`); // requestId is on APIStatusError subclasses
} else {
throw e;
}
}

See also

  • Inference — OpenAI-compatible chat completions and streaming
  • Endpoints — deploy, start, stop, and the is_live check
  • Evals — eval sets, runs, wait, and run.cost
  • Tasks — the benchmark catalog and match()