Client (Pareta, AsyncPareta)
The client is the one object you build and the only thing that talks to the network. It holds your API key, the environment URL, the timeout and retry policy, and an HTTP connection pool. Every call you make goes through it: deploying endpoints, running inference, browsing the catalog, evaluating models. There are two of them and they are mirror images: Pareta is synchronous, AsyncPareta is async/await. Pick one, build it once, reuse it.
from pareta import Pareta
with Pareta.from_env() as pa: # reads PARETA_API_KEY
print(pa.models.list())
Nothing else in the SDK is constructed directly. Resources like pa.chat, pa.endpoints, and pa.evals are attributes that hang off the client; you never instantiate them yourself.
Build it from the environment
from_env() is the recommended constructor. It reads PARETA_API_KEY and an optional PARETA_BASE_URL, then builds the client for you. It keeps pareta_sk_… secrets out of source and lets the same code run against production or staging by flipping one environment variable.
export PARETA_API_KEY="pareta_sk_live_…"
from pareta import Pareta, AsyncPareta
pa = Pareta.from_env() # sync
apa = AsyncPareta.from_env() # async — same call, async client
@classmethod
Pareta.from_env(**kwargs) -> Pareta
AsyncPareta.from_env(**kwargs) -> AsyncPareta
from_env() forwards any extra keyword arguments straight to the constructor, so you can keep the key in the environment and still override the rest in code:
pa = Pareta.from_env(max_retries=5, timeout=120.0)
An explicit api_key= or base_url= passed to from_env() wins over the environment variable of the same name.
Construct it directly
When you are not driving config from the environment, call the constructor. Both clients take the same arguments; they differ only in the type of http_client.
from pareta import Pareta
pa = Pareta(
api_key="pareta_sk_live_…",
base_url="https://api.pareta.ai",
timeout=60.0,
max_retries=2,
http_client=None,
)
Pareta(
api_key: str | None = None,
base_url: str | None = None,
timeout=None,
max_retries: int = 2, # DEFAULT_MAX_RETRIES
http_client: httpx.Client | None = None,
)
AsyncPareta(
api_key: str | None = None,
base_url: str | None = None,
timeout=None,
max_retries: int = 2,
http_client: httpx.AsyncClient | None = None,
)
| Parameter | Type | Default | What it does |
|---|---|---|---|
api_key | str | None | None | Your pareta_sk_… key. Sent as Authorization: Bearer <key>. Required (raises ParetaError if missing). |
base_url | str | None | "https://api.pareta.ai" | API root. Normalized with rstrip("/"). Pass the staging URL to point at staging. |
timeout | httpx.Timeout | float | None | httpx.Timeout(60.0, connect=10.0) | Per-request HTTP timeout. |
max_retries | int | 2 | Automatic retries on transient failures. Clamped to >= 0. |
http_client | httpx.Client | httpx.AsyncClient | None | None | Bring your own httpx client (proxies, custom transports, pools). |
api_key
The key is the one piece of config you cannot skip. The SDK sends it as a Bearer token on every request. Mint keys in the dashboard; key management is browser-only and the SDK only ever consumes a key.
If the key is falsy (and PARETA_API_KEY is unset when using from_env()), the constructor raises ParetaError before any network call:
from pareta import Pareta, ParetaError
try:
pa = Pareta(api_key="")
except ParetaError as e:
print(e)
# missing API key. Pass api_key=… or set PARETA_API_KEY
# (mint a pareta_sk_ key in the dashboard).
A key that is present but rejected by the server surfaces as AuthenticationError (401) on the first request, not at construction time.
base_url
base_url selects the environment. It defaults to production and is normalized with a trailing-slash strip, so https://api.pareta.ai/ and https://api.pareta.ai behave identically. Keys are environment-scoped: pair each base_url with a key minted for that environment.
prod = Pareta(api_key="pareta_sk_live_…") # default
staging = Pareta(api_key="pareta_sk_test_…", base_url="https://api-staging.pareta.ai")
timeout
Caps how long a single request may take. The default httpx.Timeout(60.0, connect=10.0) allows up to 10 seconds to connect and 60 seconds overall. A bare float sets the overall timeout for read, write, and connect alike. Raise it for long completions, or stream the response so tokens arrive incrementally (see Inference). Note that evals.runs.create(..., wait=True) has its own timeout argument governing the poll loop, separate from this HTTP timeout (see Evals).
import httpx
from pareta import Pareta
pa = Pareta(api_key="pareta_sk_live_…", timeout=httpx.Timeout(120.0, connect=10.0))
max_retries
The SDK automatically retries transient failures: HTTP 408, 409, 429, 500, 502, 503, 504. The default is 2 (up to 3 attempts). Backoff is exponential with jitter, capped at 8 seconds, and honors a server Retry-After header when present. Non-transient errors (401, 402, 404, and so on) raise on the first attempt. Once a stream's bytes are flowing, a mid-stream drop raises immediately and is not retried. See Errors and retries.
pa = Pareta(api_key="pareta_sk_live_…", max_retries=5) # patient batch job
pa = Pareta(api_key="pareta_sk_live_…", max_retries=0) # fail fast (tests)
http_client
By default the client constructs its own httpx client (configured with your timeout) and closes it for you. Pass http_client= to control the transport layer: an outbound proxy, mTLS, shared connection pools, or test doubles.
import httpx
from pareta import Pareta
my_client = httpx.Client(
proxy="http://proxy.internal:8080",
limits=httpx.Limits(max_connections=50, max_keepalive_connections=10),
timeout=httpx.Timeout(120.0, connect=10.0),
)
pa = Pareta(api_key="pareta_sk_live_…", http_client=my_client)
When you inject a client, you own its lifecycle and its timeout. pa.close() will not close a client you passed in, and the constructor's timeout= applies only to an SDK-owned client. Set the timeout on your own client, and close it yourself.
Lifecycle and cleanup
Each client owns an HTTP connection pool. Release it when you are done. The cleanly idiomatic way is the context manager, which closes the pool on exit.
Sync
close() -> None # close the HTTP client (only if the SDK owns it)
__enter__() -> Pareta
__exit__(*exc) -> None
from pareta import Pareta
with Pareta.from_env() as pa:
completion = pa.chat.completions.create(
model="ep_a1b2c3",
messages=[{"role": "user", "content": "Extract the parties."}],
)
print(completion.choices[0].message.content)
# HTTP client closed on exit
Or close it explicitly:
pa = Pareta.from_env()
try:
pa.models.list()
finally:
pa.close()
Async
async aclose() -> None
async __aenter__() -> AsyncPareta
async __aexit__(*exc) -> None
import asyncio
from pareta import AsyncPareta
async def main():
async with AsyncPareta.from_env() as pa:
models = await pa.models.list()
print(models)
# HTTP client closed on exit
asyncio.run(main())
The ownership rule holds in both: if you passed http_client=, neither close()/aclose() nor exiting the context manager touches it. Close your own client.
Resource namespaces
The client is a namespace router. Every capability hangs off it as an attribute. The sync client exposes the sync resources; the async client exposes the async mirrors. The method shapes match one-to-one, async methods are async def, and streaming methods return async iterators on the async client.
| Namespace | Sync type | Async type | What it does | Reference |
|---|---|---|---|---|
chat | Chat | AsyncChat | OpenAI-compatible inference via chat.completions.create(...). Metered. | chat |
models | Models | AsyncModels | models.list() — the deployed, callable endpoints (OpenAI-compatible subset). | models |
endpoints | Endpoints | AsyncEndpoints | deploy, list, retrieve, start, stop, delete, and metrics(id). | endpoints |
tasks | Tasks | AsyncTasks | Browse the benchmark catalog: list, retrieve, match, leaderboard, recommended. | tasks |
evals | Evals | AsyncEvals | evals.sets, evals.runs, and evals.frontier_models(...). Metered. | evals |
A tour of all five against one client:
from pareta import Pareta
with Pareta.from_env() as pa:
# tasks — discover what to deploy
task = "contract-key-fields"
print("recommended:", pa.tasks.recommended(task)) # a per-task alias
# endpoints — deploy it (wait=True blocks through the deploy SSE stream)
ep = pa.endpoints.deploy(task=task, model="recommended", wait=True)
print("live:", ep.id, ep.status)
# chat — OpenAI-compatible inference against the endpoint id
resp = pa.chat.completions.create(
model=ep.id,
messages=[{"role": "user", "content": "Say hello."}],
)
print(resp.choices[0].message.content)
# models — the OpenAI-compatible list of callable endpoints
for m in pa.models.list():
print(m.id, m.owned_by)
# evals — score candidates on your own data, billed total in dollars
run = pa.evals.runs.create(
task=task,
items=[{"input": "…", "expected": "…"}],
models=["qwen-vl-2"],
frontier="benchmarked",
wait=True,
)
print("run cost:", run.cost) # Decimal dollars, floored to cents
The same code on the async client, with await and the async context manager:
import asyncio
from pareta import AsyncPareta
async def main():
async with AsyncPareta.from_env() as pa:
ep = await pa.endpoints.deploy(
task="contract-key-fields", model="recommended", wait=True,
)
resp = await pa.chat.completions.create(
model=ep.id,
messages=[{"role": "user", "content": "Say hello."}],
)
print(resp.choices[0].message.content)
asyncio.run(main())
Two async differences worth pinning down: pa.endpoints.metrics(id) returns an AsyncMetrics object directly (not a coroutine), and its dimension methods are the things you await. See Async for the full sync-vs-async mapping.
Platform truths the client makes concrete
These hold no matter how you build the client. They are why there is no GPU knob, no balance API, and no real model ids in the SDK.
-
GPUs are hidden. You configure a key, a URL, timeouts, and retries — never hardware.
endpoints.deploy(task=…, model=…)takes a task and a model alias; Pareta resolves the GPU, tensor-parallelism, and quantization from its registry. There is no hardware parameter anywhere in the SDK. See Deploy endpoints. -
Models are per-task aliases. Every model id you pass or read — in
deploy(model=…), ontasks.leaderboard()rows, inrun.results[].model_id, inendpoints.list()[].model— is a per-task public alias likeqwen-vl-2. Real internal ids never cross into the SDK. Frontier (vendor) ids are in the clear. See Discovery. -
Inference and evals are metered against your org balance. A successful
pa.chat.completions.create()debits your balance;pa.evals.runs.create()debits for both open and frontier compute. AnEvalRunreports its billed total onrun.cost(aDecimalin dollars, floored to whole cents, so a sub-cent run readsDecimal("0.00")) and the raw value onrun.cost_micro_usd. When the balance hits zero, both paths raiseInsufficientCreditsError(402). Top-up is browser-only; the SDK exposes neither balance nor payment methods.from pareta import InsufficientCreditsErrortry:pa.chat.completions.create(model=ep.id, messages=[{"role": "user", "content": "ping"}])except InsufficientCreditsError:print("Out of credit — top up in the dashboard.") -
Inference is OpenAI-compatible.
base_urlplus yourpareta_sk_…key is a drop-in OpenAI endpoint. You can point theopenaiSDK at the samebase_urlto call a deployed endpoint; this SDK adds the control plane (deploy, eval, discovery) theopenaiclient cannot do. See Inference.
See also
- Configuration — the full configuration guide:
from_env,base_url, timeouts, retries, custom transports, and the configuration cookbook. - Inference —
chat.completions.create, streaming, and metering. - Deploy endpoints — deploy a model to a task and operate it.
- Discovery — browse the catalog, match intent, read leaderboards.
- Evaluation — score models on your own data, including
run.cost. - Errors and retries — the
ParetaErrorhierarchy and retry behavior. - Async — the sync-vs-async mapping for every resource.