- Documentation
- Agency
- Control LLM costs
Control LLM costs
Set hard caps that refuse a model call before it spends, get notified at 50/80/95%, and recover when a workflow hits its cap.
The problem this solves
A single agent loop can burn through a month's budget in an afternoon. The most common version reads the same way every time: an agent gets handed a tool that returns a bigger payload than expected, the loop iterates, the model keeps calling, and the next morning the operator opens an OpenAI invoice for $847. The cap was set in the provider dashboard but only checked once a day, after the bill was already accruing.
TaskJuice refuses the next call instead of regretting the last one. You set a cap on a workspace, and every model call passes through a check that compares actual spend against the cap before the call goes to the provider. When the cap is hit, the run halts; the provider was never asked. No charge, no surprise, no "we'll send it to support."
This guide walks you through setting a cap, reading the dashboards, recovering a halted run, and bringing your own provider keys so spend hits your account directly.
Set a cap
Open the cost dashboard
From the workspace, go to Analytics → Cost analysis, or visit
/<workspaceSlug>/analytics/costsdirectly. You see three tiles at the top: spend this period, hard cap, and cap status.Set a hard cap
The hard cap is the dollar amount above which the workspace cannot spend on LLM providers in the current period. The cap is enforced before each call — if a single call would push spend over the cap, that call is refused, not retried.
Pick a soft-cap action (optional)
Soft-cap notifications fire automatically at 50%, 80%, and 95% of the hard cap. Each threshold fires once per period through every channel the workspace has connected: email to every workspace admin, Slack to the workspace's connected webhook, and an in-app banner on the workspace dashboard. You don't configure thresholds individually — they are platform defaults so an operator who just wired Slack can't accidentally silence them.
Once a cap is set, the Cap status tile flips between three states:
- On track — under 80% used. The status is informational.
- >80% used — yellow badge. Email and Slack have already fired the 80% notification once.
- >95% used — red badge. Email and Slack have fired the 95% notification once. The next provider call may be the one that crosses the line.
Read the workspace dashboard
The cost dashboard at /<workspaceSlug>/analytics/costs shows three sections under the headline tiles:
- Spend over time — daily spend for the current period, with a flat reference line at the cap level so you can see exactly when spend started accelerating.
- Top spenders — workflows, runs, and BYOK keys ranked by spend this period. Click a row to drill into the underlying ledger entries for that run.
- Export CSV — pulls the period's per-workflow ledger down as a CSV with columns matching what most agency invoicing workflows expect:
period, workspaceId, workspaceName, workflowId, workflowName, microUsdSpent, connectionRef, connectionScope, providerSlug, modelRef. The export filters to production traffic only, so eval and compaction-internal calls don't inflate the rebillable totals.
Read the agency rollup
The agency-level view at /reports shows every client workspace at once: total agency spend, count of workspaces over 80% of cap, and count over 95%. The per-workspace table sorts by % of cap descending, so the workspaces nearest the line surface first.
The CSV export at the top of the page is the same shape as the workspace export, aggregated across every workspace. Most agencies use this CSV as the input for their monthly client invoice — the columns map straight into a Google Sheet or QuickBooks import without reformatting.
Recover from a hit cap
When a workflow run halts because the cap was hit, the run viewer shows a recovery banner above the run timeline. The error code is budget.cap_exceeded. The next provider call was refused, not attempted — there is no charge to reconcile.
You have three actions:
- Raise cap — opens the workspace cost dashboard so you can adjust the hard cap. Common pattern: raise the cap by enough to let the in-progress run finish, then revisit the underlying workflow that ran away.
- Resume after raise — re-runs the halted step using the run's existing context, so the model call that was refused is retried with the new cap in place. Only enabled once the cap has been raised above current actual spend.
- Acknowledge — abandons the run and records an audit row. Use when the right call is to investigate what made the workflow runaway, not to keep spending.
Caps are tightened down through scopes — the platform default is permissive, the agency default tightens it, and a workspace cap can tighten further still. A workspace cannot raise its cap above the agency default. If raising the cap doesn't move it, check the agency default at /settings/general.
Bring your own keys
By default, model spend goes through the agency's TaskJuice billing. Most agencies prefer to bring their own provider keys so spend hits the agency's OpenAI or Anthropic account directly and the agency invoices the client itself.
TaskJuice supports both patterns:
- Workspace-scope keys — one key per client workspace, never shared. The agency holds 20 separate Anthropic keys for 20 client workspaces. Manage them at
/<workspaceSlug>/integrations/llm-keys. - Account-scope keys — one shared agency key, granted to N workspaces with optional per-workspace spend caps. The agency holds one Anthropic key, grants it to every client workspace, and per-workspace cost attribution still works because the cost ledger keys on
(workspaceId, connectionRef). Manage shared keys at /ai/keys.
Add a key
Click Add key on the keys page. Pick the provider (OpenAI, Anthropic, or Google), give the key a label that matches your internal naming convention, and paste the API key.
Validate against the provider
Before saving, click Validate against provider. TaskJuice runs a tokenizer-only call against the provider — no tokens consumed, no spend. If the key is wrong, you find out before the first real workflow run, not at 3am during an automated job.
Save
On a verified key, the Save button activates. The key is encrypted at rest and never round-tripped to the browser after this dialog closes. The display only ever shows the last four characters of the key's hash.
Rotate a key
Click Rotate on any key to swap in a new value with a 24-hour overlap window. New calls use the new key immediately; in-flight runs holding the old key complete on it; after 24 hours the old key is hard-deleted. Type-to-confirm guards prevent accidental rotations.
For account-scope keys, a single rotation propagates to every workspace the key is granted to atomically. You don't need to update each workspace separately.
Grant an account key to a workspace
On the agency keys page at /ai/keys, each row shows a workspace count. Click into a key to see the per-workspace grant list and add or revoke grants. Each grant can carry a per-workspace monthly spend cap — useful when you want a shared agency key but a particular client should never spend more than $200/month on it.
When you revoke a grant, in-flight runs in that workspace finish on whatever key they were already using. New runs in that workspace fall back to the workspace's own keys, or to the platform-billed default if the workspace has no other LLM connection.
What gets refused, what gets warned
There are two kinds of cap events. The dashboard treats them differently because they mean different things:
- Soft warn — the projection for the next call would push spend over the cap, but actual spend is still under. The run continues. A yellow inline indicator appears on the run viewer's "spend tracking near cap" surface, and the 95% notification fires through email and Slack. Operators see the signal without an interrupt.
- Hard kill — actual spend has crossed the cap, OR the iteration-zero worst-case projection for an agent loop already exceeds remaining budget. The run halts with
budget.cap_exceeded. The recovery banner appears on the run viewer.
The two-tier semantic is deliberate. A loop whose worst-case sits close to the cap would otherwise kill on every iteration, training operators to set caps absurdly high to escape false positives. Soft warns let the loop run to completion when it actually fits; hard kills only fire on real overruns.
Common pitfalls
- The cap doesn't include eval, compaction, or judge traffic. Those are platform-internal and never count against the workspace cap. The CSV export filters them out as well so rebillable totals match what you actually owe the provider.
- Notifications fire once per period, per threshold. If you mute the 80% notification and then hit 95%, the 95% notification still fires. Muting is per threshold, not global.
- A halted run is still a counted run. A run that hits
budget.cap_exceededshows up in the runs list as failed. The cost ledger has no entry for the refused call, but the run-viewer event log records the cap-fired event so the audit trail is complete.
Next steps
- Set up a custom domain so client-facing URLs match your brand.
- Wire the agency rollup into your monthly invoicing workflow with the CSV export.