parlov -- HTTP Oracle Detection

Analysis engine — the Analyzer trait, ExistenceAnalyzer, three-layer scoring pipeline, signal extractors, and family-based deduplication.

Version: 0.5.0 | Files: 16 | Lines: 2,781 | Dependencies: parlov-core, http, bytes

The analysis engine. Takes DifferentialSet values (paired baseline/probe exchanges) and produces OracleResult values (verdicts, confidence scores, severity, and evidence). Pure synchronous computation — no I/O, no async.

Public API

The `Analyzer` Trait

pub trait Analyzer: Send + Sync {
    /// Decides whether enough data exists or more samples are needed.
    fn evaluate(&self, data: &DifferentialSet) -> SampleDecision;

    /// Which oracle class this analyzer serves.
    fn oracle_class(&self) -> OracleClass;

    /// Provided method: calls evaluate(), unwraps Complete, or panics.
    /// Convenience for callers who know they have enough data.
    fn analyze(&self, data: &DifferentialSet) -> OracleResult;
}

`SampleDecision`

pub enum SampleDecision {
    /// Analysis complete. Contains the final OracleResult.
    Complete(Box<OracleResult>),
    /// More samples needed. The scheduler should collect additional exchanges
    /// and call evaluate() again.
    NeedMore,
}

The adaptive sampling loop: the scheduler calls evaluate() after each sample round. If the analyzer returns NeedMore, the scheduler executes another probe pair and appends the exchanges to the DifferentialSet. This continues until Complete or a max-sample ceiling is hit.

`ExistenceAnalyzer`

The concrete implementation for the Existence oracle class. This is currently the only analyzer.

pub struct ExistenceAnalyzer;

Sampling logic in evaluate():

Compare baseline[0].status vs probe[0].status
If identical: short-circuit to Complete after 1 sample via build_result. When no body/header signals exist, this produces NotPresent. When signals are present (e.g. same status code but different response bodies), the full classification pipeline runs and can produce Likely or Confirmed
If different and < 3 samples: return NeedMore
If different and >= 3 samples: check consistency (all baseline statuses match, all probe statuses match)
- Stable: run full classification pipeline
- Unstable: return NotPresent with an instability annotation

Internal Architecture

Three-Layer Scoring Pipeline

The classification pipeline in existence/classifier.rs composes three layers:

Layer 1: Pattern Table Lookup (existence/patterns.rs)

A static lookup table mapping (baseline_status, probe_status) pairs to base confidence, base impact, labels, leak descriptions, and RFC basis strings. Patterns are grouped by confidence tier: strong (base_confidence >= 85), upper-moderate (82–84), lower-moderate (80–82), and weak (< 80). This is the coarsest signal — a status code differential alone provides base evidence.

Layer 2: Signal-Weighted Scoring (existence/signal_weights.rs, existence/families.rs)

Four signal extractors run unconditionally on every DifferentialSet:

Extractor	Module	What it detects
Status code	`signals/status_code.rs`	Status code differential between baseline/probe
Header	`signals/header.rs`	Header presence differential, header value differential
Metadata	`signals/metadata.rs`	Metadata leaks (Content-Range size, ETag values)
Body	`signals/body.rs`	Body content differential, content-type mismatch

Each signal is weighted by kind and evidence content:

Signal	Raw Confidence	Raw Impact	Family
`Content-Range` header presence	12	8	Range
`ETag` header presence	10	5	CacheValidator
`Last-Modified` header presence	8	5	CacheValidator
`WWW-Authenticate` header presence	8	8	Auth
`Accept-Ranges` header presence	5	0	Range
`Allow` header presence	6	5	General
Generic header presence	3	0	General
Content-Range size leak	5	15	Range
ETag metadata leak	3	5	(from evidence)
Body content diff	70	15	ErrorBody
Body content-type mismatch	25	10	ErrorBody
Header value diff	4	3	(from evidence)

Weights are modified by two multipliers:

Normative strength: Must/MustNot = 1.0x, Should = 0.9x, May = 0.75x
Body diff attenuation: When status codes already differ, body diff confidence is reduced to 0.25x (body differences are expected when handlers differ)

Layer 3: Family Adjustment & Verdict Derivation (existence/families.rs, existence/scoring.rs)

Signals from the same RFC mechanism are grouped into families to prevent correlated evidence from inflating scores:

Family	Signals
Range	206, 416, Content-Range, Accept-Ranges
CacheValidator	304, ETag, Last-Modified
Auth	401, 403, WWW-Authenticate
Precondition	412, If-Match/If-Unmodified-Since
Negotiation	406, Accept
ErrorBody	Body content differentials
General	Everything else

Diminishing returns within a family:

1st signal: full confidence (capped at 75)
2nd signal: 50% confidence
3rd+ signal: 0% confidence
Impact points always count fully regardless of family position

Corroboration bonus for independent signal families:

2 families: +3 confidence
3 families: +6 confidence
4+ families: +8 confidence

Verdict thresholds:

Confidence >= 80: Confirmed
Confidence >= 60: Likely
Below 60: NotPresent

Severity gating:

Confirmed verdict: severity = impact class directly (High/Medium/Low)
Likely verdict: severity capped one level below impact class (High->Medium, Medium->Low, Low->Low)
NotPresent / Inconclusive: no severity

Impact class derivation:

Size leak present: High
Metadata signals + impact score >= 40: Medium
Impact score >= 35: Medium
Otherwise: Low

Module Map

parlov-analysis/src/
├── lib.rs              # Analyzer trait, SampleDecision, re-exports
├── existence/
│   ├── mod.rs          # Re-exports ExistenceAnalyzer
│   ├── analyzer.rs     # ExistenceAnalyzer impl, adaptive sampling logic
│   ├── classifier.rs   # Three-layer pipeline composition
│   ├── patterns.rs     # Static (baseline, probe) -> PatternMatch table
│   ├── scoring.rs      # Confidence computation, verdict/severity derivation
│   ├── families.rs     # Signal family definitions, diminishing returns
│   └── signal_weights.rs  # Per-signal raw confidence/impact weights
└── signals/
    ├── mod.rs          # Test helpers (fake_exchange, diff_set builders)
    ├── status_code.rs  # Status code differential extractor
    ├── header.rs       # Header presence/value differential extractor
    ├── metadata.rs     # Metadata leak extractor
    └── body.rs         # Body content differential extractor

Extension Points

Adding a new oracle class analyzer: Implement the Analyzer trait. The analyzer receives DifferentialSet values and returns SampleDecision. Register it in the binary crate's subcommand dispatch.

Adding a new signal extractor: Create a new module under signals/. The extractor is a function fn extract(data: &DifferentialSet) -> Vec<Signal>. Add the call to extract_all_signals() in existence/analyzer.rs. Add corresponding weights in existence/signal_weights.rs.

Adding a new signal family: Add a variant to the SignalFamily enum in families.rs. Map the relevant headers or status codes to it in header_family() and status_code_family(). Add tests.

Tuning scoring: Pattern base scores live in patterns.rs. Signal weights live in signal_weights.rs. Family caps, diminishing return curves, and corroboration bonuses live in families.rs. Verdict thresholds and severity gating live in scoring.rs. All are constants or simple match arms.

parlov-analysis

On this page