Errors, retries & timeouts

Every failure the SDK can raise is a subclass of ParetaError, so one except clause catches everything, and a more specific clause catches exactly the case you care about. The client also retries transient failures for you (network blips, 429s, 5xx) with exponential backoff before giving up. This page is the map: which exception means what, what is retried automatically, and how to tune the timeout and retry budget.

Import the exceptions straight from the package:

Python

from pareta import (
    Pareta,
    ParetaError,                # base class for everything below
    APIConnectionError,         # never reached the server (DNS/TCP/TLS)
    APITimeoutError,            # subclass of APIConnectionError
    APIStatusError,             # any non-2xx from the server
    BadRequestError,            # 400, 422
    AuthenticationError,        # 401
    PermissionDeniedError,      # 403
    InsufficientCreditsError,   # 402 — org out of balance
    NotFoundError,              # 404
    ConflictError,              # 409
    RateLimitError,             # 429
    EndpointNotReadyError,      # 503 — endpoint stopped/cold/provider down
)

TypeScript

import {
  Pareta,
  ParetaError,                // base class for everything below
  APIConnectionError,         // never reached the server (DNS/TCP/TLS)
  APITimeoutError,            // subclass of APIConnectionError
  APIStatusError,             // any non-2xx from the server
  BadRequestError,            // 400, 422
  AuthenticationError,        // 401
  PermissionDeniedError,      // 403
  InsufficientCreditsError,   // 402 — org out of balance
  NotFoundError,              // 404
  ConflictError,              // 409
  RateLimitError,             // 429
  EndpointNotReadyError,      // 503 — endpoint stopped/cold/provider down
} from "pareta";

The hierarchy

ParetaError
├── APIConnectionError          request never reached the server
│   └── APITimeoutError         timed out before any response
└── APIStatusError              server returned a non-2xx status
    ├── BadRequestError         400, 422
    ├── AuthenticationError     401
    ├── InsufficientCreditsError 402
    ├── PermissionDeniedError   403
    ├── NotFoundError           404
    ├── ConflictError           409
    ├── RateLimitError          429
    └── EndpointNotReadyError   503

ParetaError is also raised directly (not as an APIStatusError) in two non-HTTP cases: constructing a client with no API key, and an evals.runs.wait() poll loop that exceeds its timeout. See Timeouts below.

Status code to exception

The server is FastAPI, so error bodies are {"detail": "<message>"} with an HTTP status. The SDK maps the status to the most specific subclass so you catch by meaning, not by sniffing integers.

Status	Exception	What it means
400, 422	`BadRequestError`	Request validation failed (bad params, malformed body)
401	`AuthenticationError`	API key missing or invalid
402	`InsufficientCreditsError`	Org is out of balance; top up in the dashboard
403	`PermissionDeniedError`	Authenticated, but not allowed to do this
404	`NotFoundError`	Endpoint / eval set / run / task id does not exist
409	`ConflictError`	Conflict (seed endpoint, transient lock/contention)
429	`RateLimitError`	Rate limited; honor `Retry-After`
503	`EndpointNotReadyError`	Target endpoint is stopped, cold-starting, or its provider is down
other 5xx	`APIStatusError`	Generic server error

Reading an `APIStatusError`

Every APIStatusError carries the fields you need to log and debug. request_id comes from the x-request-id response header and is the fastest thing to quote in a support thread.

Python

from pareta import Pareta, APIStatusError

with Pareta.from_env() as pa:
    try:
        pa.endpoints.retrieve("ep_does_not_exist")
    except APIStatusError as e:
        print(e.status_code)   # 404
        print(e.detail)        # server's `detail` string (or raw body)
        print(e.request_id)    # "req_…" — quote this in bug reports
        print(e.response)      # the underlying httpx.Response, for advanced use

TypeScript

import { Pareta, APIStatusError } from "pareta";

const pa = Pareta.fromEnv();
try {
  await pa.endpoints.retrieve("ep_does_not_exist");
} catch (e) {
  if (e instanceof APIStatusError) {
    console.log(e.status);     // 404
    console.log(e.detail);     // server's `detail` string (or raw body)
    console.log(e.requestId);  // "req_…" — quote this in bug reports
    console.log(e.response);   // the underlying fetch Response, for advanced use
  }
}

str(e) is the server's detail message when present, otherwise HTTP <code>.

The errors worth catching

Most code only needs to handle a handful of these explicitly. The rest are fine to let bubble up to a top-level except ParetaError.

`InsufficientCreditsError` (402) — out of balance

Both inference and evals are metered against your org's balance. A successful chat.completions.create() debits the balance; an evals.runs.create() debits for the open and frontier compute it runs. When the balance can't cover the call, you get a 402. Top-up is browser-only — the SDK exposes no balance or payment surface — so the right move is to surface a clear message pointing at the dashboard.

Python

from pareta import Pareta, InsufficientCreditsError

with Pareta.from_env() as pa:
    try:
        resp = pa.chat.completions.create(
            model="ep_contract_kie",
            messages=[{"role": "user", "content": "Extract the parties."}],
        )
    except InsufficientCreditsError:
        raise SystemExit("Org balance is empty. Top up at https://pareta.ai dashboard.")

TypeScript

import { Pareta, InsufficientCreditsError } from "pareta";

const pa = Pareta.fromEnv();
try {
  const resp = await pa.chat.completions.create({
    model: "ep_contract_kie",
    messages: [{ role: "user", content: "Extract the parties." }],
  });
} catch (e) {
  if (e instanceof InsufficientCreditsError) {
    throw new Error("Org balance is empty. Top up at https://pareta.ai dashboard.");
  }
  throw e;
}

`NotFoundError` (404) — wrong id

A stale or mistyped endpoint id, eval set id, run id, or task id.

Python

from pareta import Pareta, NotFoundError

with Pareta.from_env() as pa:
    try:
        ep = pa.endpoints.retrieve("ep_maybe_deleted")
    except NotFoundError:
        ep = pa.endpoints.deploy(task="contract-key-fields", wait=True)  # redeploy

TypeScript

import { Pareta, NotFoundError } from "pareta";

const pa = Pareta.fromEnv();
let ep;
try {
  ep = await pa.endpoints.retrieve("ep_maybe_deleted");
} catch (e) {
  if (e instanceof NotFoundError) {
    ep = await pa.endpoints.deploy({ task: "contract-key-fields", wait: true });  // redeploy
  } else {
    throw e;
  }
}

`EndpointNotReadyError` (503) — endpoint not serving

The endpoint exists but isn't serving yet: it's stopped, cold-starting, or the provider is briefly unavailable. The SDK already retries 503 a couple of times (see Automatic retries); if it still surfaces, start the endpoint and retry your call. Remember that hardware is fully managed — start() takes no GPU knob, just the endpoint id.

Python

from pareta import Pareta, EndpointNotReadyError

with Pareta.from_env() as pa:
    try:
        resp = pa.chat.completions.create(
            model="ep_contract_kie",
            messages=[{"role": "user", "content": "ping"}],
        )
    except EndpointNotReadyError:
        pa.endpoints.start("ep_contract_kie")          # warm it back up
        ep = pa.endpoints.retrieve("ep_contract_kie")  # poll until ep.is_live, then retry

TypeScript

import { Pareta, EndpointNotReadyError } from "pareta";

const pa = Pareta.fromEnv();
try {
  const resp = await pa.chat.completions.create({
    model: "ep_contract_kie",
    messages: [{ role: "user", content: "ping" }],
  });
} catch (e) {
  if (e instanceof EndpointNotReadyError) {
    await pa.endpoints.start("ep_contract_kie");           // warm it back up
    const ep = await pa.endpoints.retrieve("ep_contract_kie");  // poll until ep.isLive, then retry
  } else {
    throw e;
  }
}

`RateLimitError` (429) — slow down

Already retried automatically, honoring the server's Retry-After. You only see it after the retry budget is exhausted. Back off and try again later.

Python

from pareta import Pareta, RateLimitError

with Pareta.from_env() as pa:
    try:
        pa.chat.completions.create(
            model="ep_contract_kie",
            messages=[{"role": "user", "content": "hi"}],
        )
    except RateLimitError as e:
        print(f"Still rate limited after retries (request {e.request_id}); back off.")

TypeScript

import { Pareta, RateLimitError } from "pareta";

const pa = Pareta.fromEnv();
try {
  await pa.chat.completions.create({
    model: "ep_contract_kie",
    messages: [{ role: "user", content: "hi" }],
  });
} catch (e) {
  if (e instanceof RateLimitError) {
    console.log(`Still rate limited after retries (request ${e.requestId}); back off.`);
  } else {
    throw e;
  }
}

`AuthenticationError` (401) vs missing key

A 401 means the key reached the server and was rejected (wrong or revoked). That is distinct from constructing a client with no key at all, which fails fast client-side with a plain ParetaError before any request goes out:

Python

import pareta

try:
    pa = pareta.Pareta(api_key="")   # or PARETA_API_KEY unset with from_env()
except pareta.ParetaError as e:
    print(e)  # "missing API key. Pass api_key=… or set PARETA_API_KEY …"

TypeScript

import { Pareta, ParetaError } from "pareta";

try {
  const pa = new Pareta({ apiKey: "" });   // or PARETA_API_KEY unset with Pareta.fromEnv()
} catch (e) {
  if (e instanceof ParetaError) {
    console.log(e.message);  // "missing API key. Pass apiKey: … or use Pareta.fromEnv() …"
  }
}

Pre-flight `ValueError` / `TypeError`

Some mistakes never become an HTTP call. The SDK validates the obvious ones up front and raises the standard Python exception — not a ParetaError — because they are programming errors, not server responses:

chat.completions.create() raises ValueError if model or messages is empty.
tasks.match() raises ValueError if query is empty.
evals.sets.create() raises ValueError if items is empty.
evals.runs.create() raises ValueError if neither eval_set= nor task=+items= is supplied, and ValueError/TypeError if frontier= is an unparseable keyword or a frontier keyword can't be resolved to a task.
evals.sets.upload_document() raises TypeError if file is not a path, bytes, or a binary file-like object.

These are fine to let crash in development; they signal a bug in the call, not a runtime condition to recover from.

Automatic retries

The client retries transient failures for you before raising. You usually do not need a retry loop of your own.

What is retried: status codes 408, 409, 429, 500, 502, 503, 504, plus connection-level errors that happen between attempts. The default budget is max_retries=2 (so up to three attempts total).

Backoff: if the server sent a Retry-After header, the SDK waits that many seconds (capped at 30s). Otherwise it uses exponential backoff with jitter: min(0.5 * 2**attempt, 8.0) + random(0, 0.25) seconds, so roughly 0.5s, then 1s, capped at 8s.

What is not retried: stable 4xx (400, 401, 402, 403, 404, 422) raise immediately — retrying a bad request or an empty balance won't help. Connection errors on the very first attempt are surfaced as APIConnectionError / APITimeoutError once the budget is exhausted.

Tune the budget per client. Set max_retries=0 to disable retries entirely:

Python

from pareta import Pareta

# More aggressive: up to 6 attempts on transient failures.
pa = Pareta.from_env(max_retries=5)

# No retries — fail fast and handle it yourself.
strict = Pareta.from_env(max_retries=0)

TypeScript

import { Pareta } from "pareta";

// More aggressive: up to 6 attempts on transient failures.
const pa = Pareta.fromEnv({ maxRetries: 5 });

// No retries — fail fast and handle it yourself.
const strict = Pareta.fromEnv({ maxRetries: 0 });

Streaming and retries

Retries apply only to the initial handshake (connect and status line). Once SSE bytes are flowing — token chunks from a streamed chat completion, or progress/complete events from a streamed deploy — a mid-stream drop raises immediately, because the stream cannot be safely resumed. Catch it and restart the request from the top if you need to.

Python

from pareta import Pareta, APIConnectionError

with Pareta.from_env() as pa:
    try:
        for chunk in pa.chat.completions.create(
            model="ep_contract_kie",
            messages=[{"role": "user", "content": "Summarize the contract."}],
            stream=True,
        ):
            piece = chunk.choices[0].delta.content
            if piece:
                print(piece, end="", flush=True)
    except APIConnectionError:
        print("\n[stream dropped — re-issue the request to retry]")

TypeScript

import { Pareta, APIConnectionError } from "pareta";

const pa = Pareta.fromEnv();
try {
  const stream = pa.chat.completions.create({
    model: "ep_contract_kie",
    messages: [{ role: "user", content: "Summarize the contract." }],
    stream: true,
  });
  for await (const chunk of stream) {
    const piece = chunk.choices[0].delta.content;
    if (piece) process.stdout.write(piece);
  }
} catch (e) {
  if (e instanceof APIConnectionError) {
    console.log("\n[stream dropped — re-issue the request to retry]");
  } else {
    throw e;
  }
}

Timeouts

The default per-request timeout is httpx.Timeout(60.0, connect=10.0): 60s overall, 10s to establish the connection. A request that exceeds it raises APITimeoutError (a subclass of APIConnectionError) after the retry budget is spent. Override it with any httpx.Timeout (or a bare float):

Python

import httpx
from pareta import Pareta, APITimeoutError

# 120s overall, 5s to connect — handy for long generations.
pa = Pareta.from_env(timeout=httpx.Timeout(120.0, connect=5.0))

with pa:
    try:
        pa.chat.completions.create(
            model="ep_contract_kie",
            messages=[{"role": "user", "content": "Write a long summary."}],
            max_tokens=4096,
        )
    except APITimeoutError:
        print("Request timed out; consider streaming or a larger timeout.")

TypeScript

import { Pareta, APITimeoutError } from "pareta";

// 120s overall (one budget — there's no separate connect timeout in TS).
const pa = Pareta.fromEnv({ timeout: 120_000 });

try {
  await pa.chat.completions.create({
    model: "ep_contract_kie",
    messages: [{ role: "user", content: "Write a long summary." }],
    max_tokens: 4096,
  });
} catch (e) {
  if (e instanceof APITimeoutError) {
    console.log("Request timed out; consider streaming or a larger timeout.");
  } else {
    throw e;
  }
}

Eval-run wait timeout

evals.runs.create(wait=True) and evals.runs.wait() are different: they poll the run to completion. The timeout parameter there bounds the whole poll loop (default 900s), not a single HTTP request. If the run hasn't reached a terminal status (completed or failed) by the deadline, the poll helper raises a plain ParetaError — the run keeps going server-side, so you can re-retrieve() it later by id.

Python

from pareta import Pareta, ParetaError

with Pareta.from_env() as pa:
    try:
        run = pa.evals.runs.create(
            task="contract-key-fields",
            items=[{"input": "...", "expected": "..."}],
            models=["contract-1", "contract-2"],
            frontier="benchmarked",
            wait=True,
            timeout=600.0,      # give up waiting after 10 minutes
            poll_interval=5.0,
        )
        print(run.status, run.cost)        # e.g. "completed" Decimal("0.42")
    except ParetaError as e:
        print(e)  # "eval run … did not finish within 600s" — poll later with runs.retrieve(id)

TypeScript

import { Pareta, ParetaError } from "pareta";

const pa = Pareta.fromEnv();
try {
  const run = await pa.evals.runs.create({
    task: "contract-key-fields",
    items: [{ input: "...", expected: "..." }],
    models: ["contract-1", "contract-2"],
    frontier: "benchmarked",
    wait: true,
    timeout: 600,        // give up waiting after 10 minutes
    pollInterval: 5,
  });
  console.log(run.status, run.cost);   // e.g. "completed" "0.42"
} catch (e) {
  if (e instanceof ParetaError) {
    console.log(e.message);  // "eval run … did not finish within 600s" — poll later with runs.retrieve(id)
  } else {
    throw e;
  }
}

Note that a run finishing with status == "failed" is not an exception — it's a terminal state you read off the returned EvalRun (run.is_terminal is True, run.error_detail carries the message). Only the wait timeout raises.

Async

AsyncPareta raises the exact same exception classes; wrap await calls in the same try/except. Retries, backoff, and timeouts behave identically — backoff just uses asyncio.sleep under the hood.

Python

import asyncio
from pareta import AsyncPareta, InsufficientCreditsError, EndpointNotReadyError

async def main():
    async with AsyncPareta.from_env() as pa:
        try:
            resp = await pa.chat.completions.create(
                model="ep_contract_kie",
                messages=[{"role": "user", "content": "Extract the parties."}],
            )
            print(resp.choices[0].message.content)
        except InsufficientCreditsError:
            print("Top up your org balance in the dashboard.")
        except EndpointNotReadyError:
            await pa.endpoints.start("ep_contract_kie")

asyncio.run(main())

TypeScript

// There is no AsyncPareta in TypeScript — the single `Pareta` client is already
// async: every I/O method returns a Promise, so you just `await` it. The same
// exception classes, retries, backoff, and timeouts apply unchanged.
import { Pareta, InsufficientCreditsError, EndpointNotReadyError } from "pareta";

async function main() {
  const pa = Pareta.fromEnv();
  try {
    const resp = await pa.chat.completions.create({
      model: "ep_contract_kie",
      messages: [{ role: "user", content: "Extract the parties." }],
    });
    console.log(resp.choices[0].message.content);
  } catch (e) {
    if (e instanceof InsufficientCreditsError) {
      console.log("Top up your org balance in the dashboard.");
    } else if (e instanceof EndpointNotReadyError) {
      await pa.endpoints.start("ep_contract_kie");
    } else {
      throw e;
    }
  }
}

main();

A layered handler

A practical pattern: catch the few cases you can act on, then fall back to the base class so nothing escapes unhandled.

Python

from pareta import (
    Pareta,
    InsufficientCreditsError,
    EndpointNotReadyError,
    RateLimitError,
    APITimeoutError,
    ParetaError,
)

with Pareta.from_env() as pa:
    try:
        resp = pa.chat.completions.create(
            model="ep_contract_kie",
            messages=[{"role": "user", "content": "Extract the parties."}],
        )
        print(resp.choices[0].message.content)
    except InsufficientCreditsError:
        print("Out of balance — top up in the dashboard.")
    except EndpointNotReadyError:
        pa.endpoints.start("ep_contract_kie")  # warm it, then retry
    except RateLimitError:
        print("Rate limited after retries — back off and try again.")
    except APITimeoutError:
        print("Timed out — raise the timeout or stream the response.")
    except ParetaError as e:
        print(f"Unexpected SDK error: {e}")  # request_id is on APIStatusError subclasses

TypeScript

import {
  Pareta,
  InsufficientCreditsError,
  EndpointNotReadyError,
  RateLimitError,
  APITimeoutError,
  ParetaError,
} from "pareta";

const pa = Pareta.fromEnv();
try {
  const resp = await pa.chat.completions.create({
    model: "ep_contract_kie",
    messages: [{ role: "user", content: "Extract the parties." }],
  });
  console.log(resp.choices[0].message.content);
} catch (e) {
  if (e instanceof InsufficientCreditsError) {
    console.log("Out of balance — top up in the dashboard.");
  } else if (e instanceof EndpointNotReadyError) {
    await pa.endpoints.start("ep_contract_kie");  // warm it, then retry
  } else if (e instanceof RateLimitError) {
    console.log("Rate limited after retries — back off and try again.");
  } else if (e instanceof APITimeoutError) {
    console.log("Timed out — raise the timeout or stream the response.");
  } else if (e instanceof ParetaError) {
    console.log(`Unexpected SDK error: ${e.message}`);  // requestId is on APIStatusError subclasses
  } else {
    throw e;
  }
}

The hierarchy​

Status code to exception​

Reading an APIStatusError​

The errors worth catching​

InsufficientCreditsError (402) — out of balance​

NotFoundError (404) — wrong id​

EndpointNotReadyError (503) — endpoint not serving​

RateLimitError (429) — slow down​

AuthenticationError (401) vs missing key​

Pre-flight ValueError / TypeError​

Automatic retries​

Streaming and retries​

Timeouts​

Eval-run wait timeout​

Async​

A layered handler​

See also​

The hierarchy

Status code to exception

Reading an `APIStatusError`

The errors worth catching

`InsufficientCreditsError` (402) — out of balance

`NotFoundError` (404) — wrong id

`EndpointNotReadyError` (503) — endpoint not serving

`RateLimitError` (429) — slow down

`AuthenticationError` (401) vs missing key

Pre-flight `ValueError` / `TypeError`

Automatic retries

Streaming and retries

Timeouts

Eval-run wait timeout

Async

A layered handler

See also