Troubleshooting

Use structured decision paths to isolate failures, reduce time-to-resolution and escalate with complete evidence.

What this solves

Provides a repeatable incident-handling model so teams can diagnose issues quickly without ad-hoc guesswork.

Who is this for

  • On-call and incident response teams
  • QA and platform operators handling scan failures
  • Managers coordinating cross-team remediation

Prerequisites

  • Incident owner assigned
  • Relevant run IDs and timestamps collected
  • Affected surface and impact scope documented

Step-by-step

1. Classify failure domain

Identify whether the issue is scan execution, queue/workflow, analytics read path or integration contract.

2. Check primary health signals

Review queue depth, recent failures, and endpoint status for immediate containment actions.

3. Apply domain-specific decision tree

Follow yes/no branches to isolate root cause and determine next operational action.

4. Escalate with evidence package

Include incident timeline, payload samples and attempted remediation steps in escalation handoff.

Operational outputs

  • Incident classification and owner assignment
  • Root-cause hypothesis log with validation steps
  • Escalation-ready evidence bundle

Plan availability

  • Troubleshooting guidance is available across plans
  • Operational telemetry depth depends on plan-level observability surfaces
  • Enterprise support models can include expanded response workflows

Limits and guardrails

  • Do not skip incident classification before remediation
  • Avoid parallel fixes without ownership and change log
  • Escalate when repeated attempts exceed internal timeout policy

Expected outcome

  • Incident diagnosis becomes faster and less noisy
  • Escalations are actionable on first handoff
  • Post-incident learning is captured for future prevention

Troubleshooting paths

  • Use the decision tree to select first response action
  • Route queue saturation to workflow and capacity owners
  • Route schema or payload issues to integration owners with examples

Troubleshooting decision tree

Follow these branches in order. Stop at the first confirmed root cause and record evidence before moving to remediation.

1. Is the issue a failed or delayed scan execution?

Yes: Scan path

Check queue lag, worker health and run-level errors. Retry only after capacity and scope validation.

No: Continue

Move to workflow and integration branch to isolate trigger or orchestration failures.

2. Did a workflow trigger run but end with partial failure?

Yes: Workflow branch

Inspect failed URL set, dedupe policy and retry rules. Split high-risk targets from stable targets for controlled rerun.

No: Continue

Move to analytics and reporting branch for read-path or output interpretation issues.

3. Are analytics outputs incomplete or inconsistent?

Yes: Analytics branch

Validate tracker deployment, ingestion windows and cache policy. Confirm event schema normalization before attribution decisions.

No: Continue

Move to integration branch to validate webhook contract and callback handling.

4. Do integration callbacks fail signature or payload checks?

Yes: Integration branch

Verify secret rotation, signature validation and idempotency rules. Re-run with a captured payload sample.

No: Escalate

Prepare escalation package with run IDs, timestamps, observed impact and attempted actions.

Escalation

Need hands-on incident assistance?

If the decision tree does not isolate root cause, open an escalation with full run evidence.

Troubleshooting | Crawlens Docs — Crawlens