Troubleshooting
Use structured decision paths to isolate failures, reduce time-to-resolution and escalate with complete evidence.
What this solves
Provides a repeatable incident-handling model so teams can diagnose issues quickly without ad-hoc guesswork.
Who is this for
- •On-call and incident response teams
- •QA and platform operators handling scan failures
- •Managers coordinating cross-team remediation
Prerequisites
- •Incident owner assigned
- •Relevant run IDs and timestamps collected
- •Affected surface and impact scope documented
Step-by-step
1. Classify failure domain
Identify whether the issue is scan execution, queue/workflow, analytics read path or integration contract.
2. Check primary health signals
Review queue depth, recent failures, and endpoint status for immediate containment actions.
3. Apply domain-specific decision tree
Follow yes/no branches to isolate root cause and determine next operational action.
4. Escalate with evidence package
Include incident timeline, payload samples and attempted remediation steps in escalation handoff.
Operational outputs
- •Incident classification and owner assignment
- •Root-cause hypothesis log with validation steps
- •Escalation-ready evidence bundle
Plan availability
- •Troubleshooting guidance is available across plans
- •Operational telemetry depth depends on plan-level observability surfaces
- •Enterprise support models can include expanded response workflows
Limits and guardrails
- •Do not skip incident classification before remediation
- •Avoid parallel fixes without ownership and change log
- •Escalate when repeated attempts exceed internal timeout policy
Expected outcome
- •Incident diagnosis becomes faster and less noisy
- •Escalations are actionable on first handoff
- •Post-incident learning is captured for future prevention
Troubleshooting paths
- •Use the decision tree to select first response action
- •Route queue saturation to workflow and capacity owners
- •Route schema or payload issues to integration owners with examples
Troubleshooting decision tree
Follow these branches in order. Stop at the first confirmed root cause and record evidence before moving to remediation.
1. Is the issue a failed or delayed scan execution?
Yes: Scan path
Check queue lag, worker health and run-level errors. Retry only after capacity and scope validation.
No: Continue
Move to workflow and integration branch to isolate trigger or orchestration failures.
2. Did a workflow trigger run but end with partial failure?
Yes: Workflow branch
Inspect failed URL set, dedupe policy and retry rules. Split high-risk targets from stable targets for controlled rerun.
No: Continue
Move to analytics and reporting branch for read-path or output interpretation issues.
3. Are analytics outputs incomplete or inconsistent?
Yes: Analytics branch
Validate tracker deployment, ingestion windows and cache policy. Confirm event schema normalization before attribution decisions.
No: Continue
Move to integration branch to validate webhook contract and callback handling.
4. Do integration callbacks fail signature or payload checks?
Yes: Integration branch
Verify secret rotation, signature validation and idempotency rules. Re-run with a captured payload sample.
No: Escalate
Prepare escalation package with run IDs, timestamps, observed impact and attempted actions.
Escalation
Need hands-on incident assistance?
If the decision tree does not isolate root cause, open an escalation with full run evidence.