parlov docs

parlov-analysis

Analysis engine — the Analyzer trait, ExistenceAnalyzer, three-layer scoring pipeline, signal extractors, and family-based deduplication.

Implemented

Version: 0.5.0 | Files: 16 | Lines: 2,781 | Dependencies: parlov-core, http, bytes

The analysis engine. Takes DifferentialSet values (paired baseline/probe exchanges) and produces OracleResult values (verdicts, confidence scores, severity, and evidence). Pure synchronous computation — no I/O, no async.


Public API

The Analyzer Trait

pub trait Analyzer: Send + Sync {
    /// Decides whether enough data exists or more samples are needed.
    fn evaluate(&self, data: &DifferentialSet) -> SampleDecision;

    /// Which oracle class this analyzer serves.
    fn oracle_class(&self) -> OracleClass;

    /// Provided method: calls evaluate(), unwraps Complete, or panics.
    /// Convenience for callers who know they have enough data.
    fn analyze(&self, data: &DifferentialSet) -> OracleResult;
}

SampleDecision

pub enum SampleDecision {
    /// Analysis complete. Contains the final OracleResult.
    Complete(Box<OracleResult>),
    /// More samples needed. The scheduler should collect additional exchanges
    /// and call evaluate() again.
    NeedMore,
}

The adaptive sampling loop: the scheduler calls evaluate() after each sample round. If the analyzer returns NeedMore, the scheduler executes another probe pair and appends the exchanges to the DifferentialSet. This continues until Complete or a max-sample ceiling is hit.

ExistenceAnalyzer

The concrete implementation for the Existence oracle class. This is currently the only analyzer.

pub struct ExistenceAnalyzer;

Sampling logic in evaluate():

  1. Compare baseline[0].status vs probe[0].status
  2. If identical: short-circuit to Complete after 1 sample via build_result. When no body/header signals exist, this produces NotPresent. When signals are present (e.g. same status code but different response bodies), the full classification pipeline runs and can produce Likely or Confirmed
  3. If different and < 3 samples: return NeedMore
  4. If different and >= 3 samples: check consistency (all baseline statuses match, all probe statuses match)
    • Stable: run full classification pipeline
    • Unstable: return NotPresent with an instability annotation

Internal Architecture

Three-Layer Scoring Pipeline

The classification pipeline in existence/classifier.rs composes three layers:

Layer 1: Pattern Table Lookup (existence/patterns.rs)

A static lookup table mapping (baseline_status, probe_status) pairs to base confidence, base impact, labels, leak descriptions, and RFC basis strings. Patterns are grouped by confidence tier: strong (base_confidence >= 85), upper-moderate (82–84), lower-moderate (80–82), and weak (< 80). This is the coarsest signal — a status code differential alone provides base evidence.

Layer 2: Signal-Weighted Scoring (existence/signal_weights.rs, existence/families.rs)

Four signal extractors run unconditionally on every DifferentialSet:

ExtractorModuleWhat it detects
Status codesignals/status_code.rsStatus code differential between baseline/probe
Headersignals/header.rsHeader presence differential, header value differential
Metadatasignals/metadata.rsMetadata leaks (Content-Range size, ETag values)
Bodysignals/body.rsBody content differential, content-type mismatch

Each signal is weighted by kind and evidence content:

SignalRaw ConfidenceRaw ImpactFamily
Content-Range header presence128Range
ETag header presence105CacheValidator
Last-Modified header presence85CacheValidator
WWW-Authenticate header presence88Auth
Accept-Ranges header presence50Range
Allow header presence65General
Generic header presence30General
Content-Range size leak515Range
ETag metadata leak35(from evidence)
Body content diff7015ErrorBody
Body content-type mismatch2510ErrorBody
Header value diff43(from evidence)

Weights are modified by two multipliers:

  • Normative strength: Must/MustNot = 1.0x, Should = 0.9x, May = 0.75x
  • Body diff attenuation: When status codes already differ, body diff confidence is reduced to 0.25x (body differences are expected when handlers differ)

Layer 3: Family Adjustment & Verdict Derivation (existence/families.rs, existence/scoring.rs)

Signals from the same RFC mechanism are grouped into families to prevent correlated evidence from inflating scores:

FamilySignals
Range206, 416, Content-Range, Accept-Ranges
CacheValidator304, ETag, Last-Modified
Auth401, 403, WWW-Authenticate
Precondition412, If-Match/If-Unmodified-Since
Negotiation406, Accept
ErrorBodyBody content differentials
GeneralEverything else

Diminishing returns within a family:

  • 1st signal: full confidence (capped at 75)
  • 2nd signal: 50% confidence
  • 3rd+ signal: 0% confidence
  • Impact points always count fully regardless of family position

Corroboration bonus for independent signal families:

  • 2 families: +3 confidence
  • 3 families: +6 confidence
  • 4+ families: +8 confidence

Verdict thresholds:

  • Confidence >= 80: Confirmed
  • Confidence >= 60: Likely
  • Below 60: NotPresent

Severity gating:

  • Confirmed verdict: severity = impact class directly (High/Medium/Low)
  • Likely verdict: severity capped one level below impact class (High->Medium, Medium->Low, Low->Low)
  • NotPresent / Inconclusive: no severity

Impact class derivation:

  • Size leak present: High
  • Metadata signals + impact score >= 40: Medium
  • Impact score >= 35: Medium
  • Otherwise: Low

Module Map

parlov-analysis/src/
├── lib.rs              # Analyzer trait, SampleDecision, re-exports
├── existence/
│   ├── mod.rs          # Re-exports ExistenceAnalyzer
│   ├── analyzer.rs     # ExistenceAnalyzer impl, adaptive sampling logic
│   ├── classifier.rs   # Three-layer pipeline composition
│   ├── patterns.rs     # Static (baseline, probe) -> PatternMatch table
│   ├── scoring.rs      # Confidence computation, verdict/severity derivation
│   ├── families.rs     # Signal family definitions, diminishing returns
│   └── signal_weights.rs  # Per-signal raw confidence/impact weights
└── signals/
    ├── mod.rs          # Test helpers (fake_exchange, diff_set builders)
    ├── status_code.rs  # Status code differential extractor
    ├── header.rs       # Header presence/value differential extractor
    ├── metadata.rs     # Metadata leak extractor
    └── body.rs         # Body content differential extractor

Extension Points

Adding a new oracle class analyzer: Implement the Analyzer trait. The analyzer receives DifferentialSet values and returns SampleDecision. Register it in the binary crate's subcommand dispatch.

Adding a new signal extractor: Create a new module under signals/. The extractor is a function fn extract(data: &DifferentialSet) -> Vec<Signal>. Add the call to extract_all_signals() in existence/analyzer.rs. Add corresponding weights in existence/signal_weights.rs.

Adding a new signal family: Add a variant to the SignalFamily enum in families.rs. Map the relevant headers or status codes to it in header_family() and status_code_family(). Add tests.

Tuning scoring: Pattern base scores live in patterns.rs. Signal weights live in signal_weights.rs. Family caps, diminishing return curves, and corroboration bonuses live in families.rs. Verdict thresholds and severity gating live in scoring.rs. All are constants or simple match arms.

On this page