parlov docs

Signal Scoring

The confidence and severity model — base anchors, normative weighting, signal families, corroboration bonuses, verdict thresholds, and why confidence and severity are separate axes.

Implemented

When the detection pipeline produces a finding, it needs to answer two questions: is this real? and if real, how bad is it? These are independent axes.

Confidence — "Is this differential real?"

Confidence is a per-signal weighted score that aggregates evidence quality.

Base anchor: The status code pair provides the initial confidence score. A well-known pair like 403/404 starts with high base confidence. An unclassified pair starts low.

Normative weighting adjusts confidence based on how strongly the relevant specification mandates the observed behavior:

  • MUST (weight: 1.0) — The RFC requires this behavior. A server returning 304 on If-None-Match: * for an existing resource is doing exactly what the spec says. Strongest basis.
  • SHOULD (weight: 0.9) — The RFC recommends this behavior. A server returning 206 on a valid Range request is honoring a recommendation. Strong basis.
  • MAY (weight: 0.75) — The RFC permits but does not require this behavior. A server returning 406 on an unsupported Accept value is making a choice. Moderate basis.

Normative weighting is applied per technique, not per signal. All signals from a single probe pair share the technique's normative strength value.

Reproducibility gating is binary: all samples must agree on status code for the differential to be classified. If any sample disagrees, the finding is immediately marked NotPresent. There is no partial-stability weighting.

Corroboration bonus rewards independent confirmation from multiple signal types:

  • 2 independent signals: +3 confidence
  • 3 independent signals: +6
  • 4+ independent signals: +8

Signal Families — Preventing Double-Counting

Correlated signals from the same RFC mechanism are grouped into families. Each family has a maximum confidence yield to prevent a single mechanism from inflating the score through redundant evidence.

FamilyMembers
Range206, 416, Content-Range, Accept-Ranges
Cache validator304, ETag, Last-Modified
Auth401, 403, WWW-Authenticate
Precondition412, If-Match consequence, If-Unmodified-Since consequence
Negotiation406, Accept consequence
Error bodyBody-diff signals (error message granularity vector)
GeneralSignals not associated with a specific RFC mechanism

Within a family, the first signal contributes full confidence points. The second contributes half. The third and beyond contribute minimal or zero confidence points.

Impact points are scored independently — if a later signal in a family adds new leak content (e.g., Content-Range disclosing exact resource size), that's an impact finding, not double-counted confidence.

Verdict Thresholds

  • Confirmed: confidence >= 80
  • Likely: confidence 60–79
  • Not Present: confidence < 60

Severity — "If real, how bad is it?"

Severity reflects the peak impact among validated signals, independent of confidence. It is not scaled by confidence — a 55% probability of a critical leak is still a critical risk that warrants manual investigation.

Impact classification per leak type:

  • Existence only (status code differential, no metadata) → Low/Medium
  • Cache validator disclosed (ETag, Last-Modified values) → Medium
  • Authentication mechanism/realm disclosed (WWW-Authenticate parameters) → Medium
  • Exact resource size disclosed (Content-Range bytes */N) → High

Confidence gating on severity display:

  • Confidence >= 80: display full impact class
  • Confidence 60–79: cap severity one level below impact, or display with "Likely" qualifier
  • Confidence < 60: suppress finding (Not Present)

Why Confidence and Severity Are Separate

A finding can be high-confidence, low-severity — a confirmed existence differential with no metadata leak. Or low-confidence, high-severity — an uncertain differential that, if real, discloses exact resource sizes. Collapsing these into a single score (e.g., impact × confidence) misrepresents the actual risk. The operator needs both dimensions to make a decision: confidence tells them whether to trust the finding, severity tells them what's at stake if they do.

On this page