Skip to main content

Walk a failed run and decide if it is retryable

Open a failed run, find the first failed step, read its error code and provider status, and decide whether a replay will help.

When a run stops with a red Failed badge, one step inside it errored and the run halted there. Your job is to find that step, read the structured error it produced, and decide one thing: is this worth re-running, or do you need to change something first.

This walkthrough takes you through a single failed run end to end. By the end you will be able to read any failed run the same way, because every step's error follows the same shape no matter which app raised it.

What a failed run is telling you

A run fails when one step returns a structured error instead of an output. The run does not keep going past that step, so the failed step is the one you care about. Every other step before it succeeded; everything after it never ran.

That structured error carries five fields, and the run detail page renders all of them. You will read four:

  • code tells you what kind of failure it was, in a machine-readable form like AUTH_MISSING or TIMEOUT. This is the headline.
  • message is the human-readable summary of what went wrong.
  • providerStatus is the HTTP status the upstream app returned, when there was one (for example 429 or 500). It tells you whether the failure came from the other service or from inside the step.
  • retryable is a true or false signal for whether running the step again is likely to succeed. The page shows it as a (retryable) tag next to the provider status.
  • detail is extra technical text, tucked behind a disclosure for when you need to send something to support.

The whole question of "should I replay this?" comes down to reading code, providerStatus, and retryable together. The rest of this page walks one run so you can see how.

Open the run and jump to the failed step

  1. Open the failed run

    Go to Monitoring in the workspace sidebar, then Runs. That opens the run list at /{workspaceSlug}/monitoring/workflow-runs.

    Set the Status filter to Failed to narrow the list to runs that stopped on an error, then click the Run ID of the run you want to inspect. Each Run ID is the first 8 characters of the full id, shown as a monospace link.

    The run detail page opens at /{workspaceSlug}/monitoring/workflow-runs/{workflowId}/{runId}.

  2. Confirm the run failed and read the top-level error

    At the top of the run detail page, the status badge reads Failed. Just below the overview, an alert titled Execution Error shows the run's error string in monospace. This is the run-level summary; the per-step error you will act on lives in the timeline below it.

  3. Jump to the first failed step

    Scroll to the Execution Steps section. The header shows a count of steps and badges for how many completed, are running, failed, are waiting, or are paused.

    When a step failed, a Jump to failed step link appears, and the first failed step is already expanded for you. Click the link, or just read the open step. This is the step that stopped the run.

Each step row shows a status icon, its number (1-based), its title, and a kind badge naming the node type (for example http.request). The failed step carries the red failed status.

Read the error envelope

Inside the expanded failed step, the error renders as a red alert. Read it top to bottom.

The alert title is the error code. If the step raised a known kind of failure, you see the code itself, like AUTH_MISSING or TIMEOUT. If the underlying failure had no specific code, the title reads the literal word Error instead (that happens when the code is the catch-all UNKNOWN_ERROR).

The alert body is the message, in monospace. Read it as the plain-language version of what the code is telling you.

If the upstream app returned an HTTP status, a line below the message reads:

Provider status: 429 (retryable)

The number is the providerStatus. The (retryable) tag appears only when the error's retryable field is true. If you see no provider status line at all, the failure did not come from an upstream HTTP response (for example, a missing connection or a validation error inside the step).

If the runtime is already re-attempting the step on its own, you also see:

Retrying in 12s

That means TaskJuice scheduled an automatic retry; you do not need to replay it yourself yet. Refresh the run to see whether the retry succeeded.

For anything you need to escalate, expand Technical details under the error. It holds the longer detail text and repeats the code as Code: <code> so you can copy both into a support request.

Codes are open-ended, not a fixed list

TaskJuice does not publish a closed set of error codes. Most codes come straight from the app or HTTP layer that failed, so you will see provider-specific values. Read the code as a label that pairs with the message, and lean on providerStatus and retryable for the retry decision rather than memorizing codes.

Decide whether to replay

You now have three signals. Combine them.

Replaying the run as-is makes sense when the failure looks transient: the step was likely to fail this once and might succeed on a fresh attempt. The clearest signal is the retryable tag. Provider statuses in the 429 (rate limited) and 5xx (the upstream app had a server-side problem) range are the usual transient cases, and they often carry (retryable).

Replaying without changing anything will not help when the failure is a configuration or authorization problem, because the same input will produce the same error. A 4xx provider status other than 429 (for example 401 unauthorized, 403 forbidden, or 404 not found), or a code like AUTH_MISSING, points at something you need to fix first: a broken connection, a wrong field mapping, or a resource that does not exist. Fix the workflow, then replay.

Use this as your decision table:

What you seeLikely meaningYour move
(retryable) tag presentThe runtime considers another attempt sensibleReplay as-is
Retrying in {N}s shownAn automatic retry is already scheduledWait and refresh; do not replay yet
Provider status 429The upstream app rate-limited the requestReplay as-is, or slow the workflow down
Provider status 5xxThe upstream app had a server-side errorReplay as-is after a short wait
Provider status 401 / 403The connection is not authorizedReconnect the integration, then replay
Code AUTH_MISSINGNo working connection on the stepFix the connection, then replay
Provider status 404 or a validation messageA reference or mapping is wrongFix the workflow, then replay
Replay lives on a different surface, and it is gated

The Monitoring run detail page lets you pause, resume, and cancel a run, but it does not have a Replay button. Replay is offered only for terminal runs (status completed, failed, or cancelled), and it runs from the rich run viewer. You also cannot replay a run that is still pending or running. See the related pages below for the full replay flow and its limits.

A worked example: a rate-limited Slack post

Say a workflow that posts to Slack on each new lead fails. You open the run, set the Status filter to Failed, and open it. The status badge reads Failed, and the Jump to failed step link takes you to step 3, an http.request node titled "Post to Slack channel".

Its error alert reads:

  • Title (the code): a provider-derived code for the failed call
  • Message: a short summary of what the call returned
  • A line: Provider status: 429 (retryable)

The 429 says Slack rate-limited the request, and (retryable) confirms a fresh attempt is reasonable. There is no Retrying in {N}s line, so the runtime is not auto-retrying this one. Your move: replay the run as-is. If the same 429 keeps recurring, the underlying problem is volume, and the fix is in the workflow (slow the trigger or batch the posts), not in the replay button.

Now contrast it with a 401 on the same step. Same code-message-status reading, but Provider status: 401 with no (retryable) tag means the Slack connection is no longer authorized. Replaying as-is would fail again with the same 401. You reconnect the integration first, then replay.

What this page does not cover

This walkthrough is about reading one already-failed run. It does not cover:

  • Runs that paused or are waiting rather than failed. A paused or waiting run did not error; it is parked. See the run and step statuses reference for what each status means and why a run paused for the four reasons a run stops short of failing.
  • The replay flow itself, and its limits (you can run at most 20 replays per minute per workspace, and a replay chain caps at depth 5). See replay and re-run.
  • Read one run in detail covers the full run detail page: overview fields, the trigger payload, and the step timeline's Input, Output, and Console sections.
  • Run and step statuses defines every run and step status and the next action for each.
  • Replay and re-run walks the replay modes and the exact limits.
Was this helpful?