Examples
Complete, runnable workflows for real jobs, in Python and TypeScript. Each page is self-contained and uses the real SDK surface end to end (Pareta.from_env() / Pareta.fromEnv()): deploy a model (no GPU knob), call it with OpenAI-compatible inference, and read the metered cost in dollars off run.cost. Grouped by what you are trying to do.
Deploy and call a model
You know the task; you want a live endpoint and a response.
- Deploy a model and call it — the two-call workflow:
endpoints.deploy(task, model="recommended", wait=True)thenchat.completions.create(model=endpoint.id, ...). Covers deploy events, streaming, metering andInsufficientCreditsError, errors, endpoint ops, and async.
Pick the right model first
You have a job in plain English, or your own data, and want to deploy the model that actually wins on it.
- From a sentence to a deployed winner — the full funnel:
tasks.matchtoleaderboardtoevals.runson your own data, pick the bestkind == "open"model,endpoints.deployit, then run inference. - Benchmark models on your own data — build an eval set from your rows, run open candidates against
frontier="benchmarked", and read ranked results plusrun.cost. - Document extraction (PDF/image) — the blob-task loop: build an eval set from your PDFs/images,
upload_documentper row, run against open candidates plus vision frontier baselines, pick the winner by quality and cost, deploy, then run OpenAI-compatible inference.
Inference patterns
Getting tokens out efficiently.
- Streaming chat completions — stream tokens with
chat.completions.create(stream=True): iterateChatCompletionChunkobjects, readdelta.content, accumulate full text, plus async streaming and metering behavior. - Concurrent calls with AsyncPareta — fire many inference and eval calls concurrently with
AsyncParetaandasyncio.gather, with semaphore backpressure and per-task error handling.
Operate and monitor
Watching what is deployed, and what it costs.
- Cost & quality monitoring — read what calls and eval runs cost, the open-vs-frontier savings framing, and watch a live endpoint's spend and quality via
endpoints.metrics().
Migrating in
Already on the OpenAI SDK.
- Migrating from the OpenAI SDK — keep using the
openaiclient against Pareta (base_url+pareta_sk_key), and when to switch to theparetaSDK for deploy, eval, and discovery.