Signal Scoring
The confidence and severity model — base anchors, normative weighting, signal families, corroboration bonuses, verdict thresholds, and why confidence and severity are separate axes.
When the detection pipeline produces a finding, it needs to answer two questions: is this real? and if real, how bad is it? These are independent axes.
Confidence — "Is this differential real?"
Confidence is a per-signal weighted score that aggregates evidence quality.
Base anchor: The status code pair provides the initial confidence score. A well-known pair like 403/404 starts with high base confidence. An unclassified pair starts low.
Normative weighting adjusts confidence based on how strongly the relevant specification mandates the observed behavior:
- MUST (weight: 1.0) — The RFC requires this behavior. A server returning
304onIf-None-Match: *for an existing resource is doing exactly what the spec says. Strongest basis. - SHOULD (weight: 0.9) — The RFC recommends this behavior. A server returning
206on a validRangerequest is honoring a recommendation. Strong basis. - MAY (weight: 0.75) — The RFC permits but does not require this behavior. A server returning
406on an unsupportedAcceptvalue is making a choice. Moderate basis.
Normative weighting is applied per technique, not per signal. All signals from a single probe pair share the technique's normative strength value.
Reproducibility gating is binary: all samples must agree on status code for the differential to be classified. If any sample disagrees, the finding is immediately marked NotPresent. There is no partial-stability weighting.
Corroboration bonus rewards independent confirmation from multiple signal types:
- 2 independent signals: +3 confidence
- 3 independent signals: +6
- 4+ independent signals: +8
Signal Families — Preventing Double-Counting
Correlated signals from the same RFC mechanism are grouped into families. Each family has a maximum confidence yield to prevent a single mechanism from inflating the score through redundant evidence.
| Family | Members |
|---|---|
| Range | 206, 416, Content-Range, Accept-Ranges |
| Cache validator | 304, ETag, Last-Modified |
| Auth | 401, 403, WWW-Authenticate |
| Precondition | 412, If-Match consequence, If-Unmodified-Since consequence |
| Negotiation | 406, Accept consequence |
| Error body | Body-diff signals (error message granularity vector) |
| General | Signals not associated with a specific RFC mechanism |
Within a family, the first signal contributes full confidence points. The second contributes half. The third and beyond contribute minimal or zero confidence points.
Impact points are scored independently — if a later signal in a family adds new leak content (e.g., Content-Range disclosing exact resource size), that's an impact finding, not double-counted confidence.
Verdict Thresholds
- Confirmed: confidence >= 80
- Likely: confidence 60–79
- Not Present: confidence < 60
Severity — "If real, how bad is it?"
Severity reflects the peak impact among validated signals, independent of confidence. It is not scaled by confidence — a 55% probability of a critical leak is still a critical risk that warrants manual investigation.
Impact classification per leak type:
- Existence only (status code differential, no metadata) → Low/Medium
- Cache validator disclosed (ETag, Last-Modified values) → Medium
- Authentication mechanism/realm disclosed (WWW-Authenticate parameters) → Medium
- Exact resource size disclosed (Content-Range
bytes */N) → High
Confidence gating on severity display:
- Confidence >= 80: display full impact class
- Confidence 60–79: cap severity one level below impact, or display with "Likely" qualifier
- Confidence < 60: suppress finding (Not Present)
Why Confidence and Severity Are Separate
A finding can be high-confidence, low-severity — a confirmed existence differential with no metadata leak. Or low-confidence, high-severity — an uncertain differential that, if real, discloses exact resource sizes. Collapsing these into a single score (e.g., impact × confidence) misrepresents the actual risk. The operator needs both dimensions to make a decision: confidence tells them whether to trust the finding, severity tells them what's at stake if they do.