⚡ Intervention — Alert
Alert Context
Signal
-
Agent
-
ARN
-
Session
-
Tokens
-
Duration
-
⚡ Session Intervention — agent
Session Context
Agent
-
ARN
-
Session
-
Status
-
Tokens
-
Duration
-
⚙️ Safety Thresholds
Configure alert thresholds for cost and evaluation signals. Changes apply on next sync. Observability thresholds are controlled by CloudWatch alarm configurations.
💰 Cost Thresholds
Default Budget Amount ($)Monthly budget for all agents (updates AWS Budgets on save)
Budget Warning %Triggers warning when budget usage exceeds this
Budget Critical %Triggers critical when budget usage exceeds this
🧪 Evaluation Thresholds
Configure when the evaluation alarm fires for each agent. One CloudWatch alarm is created per agent that monitors bad responses across 7 built-in evaluators.
Per-Evaluator Bad Count Thresholds
Maximum bad responses allowed before the alarm fires. Set to 0 to exclude from alarm. Minimum active value: 1.
HarmfulnessMax harmful responses. Most sensitive — even 1 harmful response is typically critical. Min: 1
CorrectnessMax incorrect or partially correct responses. Counts both "Incorrect" and "Partially Correct" labels. Min: 1
Goal Success RateMax goal failures (agent failed to achieve the user's goal). Min: 1
HelpfulnessMax unhelpful responses. Dashboard display only — does not affect CloudWatch alarm
FaithfulnessMax unfaithful responses (hallucinations). Dashboard display only — does not affect CloudWatch alarm
Tool Selection AccuracyMax times agent picked the wrong tool for the task. Min: 1
Tool Parameter AccuracyMax times agent passed wrong parameters to a tool. Min: 1
Dashboard Display Thresholds
Controls severity badges in dashboard tables. Uses bad percentage (bad ÷ total × 100), not raw counts. These do not affect CloudWatch alarms.
Dashboard Warning %Show ⚠️ warning badge when bad % exceeds this value (default: 20%)
Dashboard Critical %Show 🔴 critical badge when bad % exceeds this value (default: 50%)
📡 Observability Thresholds
Per-Metric Thresholds
Latency (ms)Alarm when average response time exceeds this (default: 10000ms = 10s)
Error CountAlarm when errors per period exceed this (default: 5)
Token UsageAlarm when tokens per period exceed this (default: 100000)
Invocation CountAlarm when invocations per period exceed this (default: 200)
Evaluation Window
Evaluation PeriodsNumber of 5-min periods to evaluate (default: 3)
Datapoints to AlarmHow many periods must breach before alarm fires (default: 2). Must be ≤ Evaluation Periods