parlov docs

The Scan Pipeline

Code orchestration from input to verdict — the four stages, workspace layout, crate boundaries, and the Probe and Analyzer traits.

Implemented

parlov's detection pipeline separates observation from interpretation from corroboration. Each layer has a distinct responsibility and zero knowledge of the layers above it.

Layer 1 — Detection (Code-Blind)

Layer 1 has no knowledge of HTTP semantics. It compares two values and checks whether the difference is reproducible.

Input: Two sets of responses — identical requests differing only in the resource identifier. Baseline uses the known-valid identifier. Probe uses the candidate identifier.

Process:

  1. Send baseline request, record response.
  2. Send probe request, record response.
  3. Compare: does any observable property differ?
  4. If a difference is detected, re-send both requests N times.
  5. If the differential is stable across all retries, emit a stable differential.

Output: Either a stable differential (with the raw values from both sides and the sample count) or nothing.

The default sample count is 3 — sufficient to filter transient infrastructure noise (GC pauses, load balancer jitter, connection resets) without being expensive. Stability is strict: all N samples on each side must return the same status code. Any inconsistency means the signal is unstable and is not forwarded to Layer 2.

What Layer 1 does not do: no exclusion lists, no pattern matching, no severity assignment, no verdict. It answers exactly one question — "is this difference real and reproducible?" — and nothing else.

Adaptive short-circuit: If the first pair shows no differential (same status code), Layer 1 returns "not present" after a single sample without retrying. Retries only happen when a differential is detected and needs stability confirmation.

Layer 2 — Classification (Code-Aware)

Layer 2 receives a confirmed stable differential and applies protocol-informed semantics to label and score it.

The classification draws on a pattern table that maps differential pairs to known oracle types. Each entry carries:

  • Label — a human-readable name for the oracle (e.g., "Authorization-based differential", "Conditional-request differential")
  • Confidence level — how strongly the pattern indicates an oracle
  • RFC basis — the specific specification section that grounds the behavior

Examples from the pattern table:

BaselineProbeLabelConfidenceRFC Basis
403 Forbidden404 Not FoundAuthorization-based differentialHighRFC 9110 §15.5.4
304 Not Modified404 Not FoundConditional-request differentialHighRFC 9110 §15.4.5
409 Conflict201 CreatedConflict-based creation differentialHighRFC 9110 §15.5.10
412 Precondition Failed404 Not FoundPrecondition-failed differentialHighRFC 9110 §13.1.1
429 Too Many Requests404 Not FoundRate-limit-based differentialMediumRFC 6585 §4
422 Unprocessable404 Not FoundValidation-path differentialHighRFC 9110 §15.5.21

Verdict mapping:

  • High confidence patterns produce a Confirmed verdict — the differential has a well-understood RFC basis and the behavior is unambiguous.
  • Medium confidence patterns produce a Likely verdict — the differential is real but the status code semantics are broad or context-dependent.
  • Unclassified stable differentials (pairs not in the pattern table) receive a base confidence of 40. Without additional signals pushing confidence above the Likely threshold (60), they produce a NotPresent verdict. Additional corroborating signals can elevate unclassified differentials into Likely or Confirmed territory.

Layer 3 — Corroboration (Multi-Signal) (planned, not yet implemented)

Layer 3 is designed to cross-check medium or low-confidence findings with additional signal classes to promote or demote confidence. It is not yet active in the current pipeline.

Planned corroboration signals:

  • Body differential — do the response bodies diverge between baseline and probe? If a 400/201 differential also shows distinct body content ("email already exists" vs. account created), confidence would be promoted.
  • Header differential — different header sets between baseline and probe? WWW-Authenticate presence, Allow header values, Set-Cookie differences.
  • Timing differential — does one code path take measurably longer, suggesting deeper server-side execution?
  • Cross-method consistency — does the same differential appear across GET, HEAD, and DELETE for the same resource? Consistency across methods would strengthen confidence.

Planned promotion and demotion:

  • Likely + body corroboration → Confirmed
  • Likely + no corroborating signals → remains Likely
  • Likely + contradicting signals (e.g., bodies are identical despite status code diff) → demoted

When implemented, Layer 3 will operate at the orchestration level, composing evidence across signal types collected by independent probes.

On this page