parlov-analysis
Analysis engine — the Analyzer trait, ExistenceAnalyzer, three-layer scoring pipeline, signal extractors, and family-based deduplication.
Version: 0.5.0 | Files: 16 | Lines: 2,781 | Dependencies: parlov-core, http, bytes
The analysis engine. Takes DifferentialSet values (paired baseline/probe exchanges) and produces OracleResult values (verdicts, confidence scores, severity, and evidence). Pure synchronous computation — no I/O, no async.
Public API
The Analyzer Trait
pub trait Analyzer: Send + Sync {
/// Decides whether enough data exists or more samples are needed.
fn evaluate(&self, data: &DifferentialSet) -> SampleDecision;
/// Which oracle class this analyzer serves.
fn oracle_class(&self) -> OracleClass;
/// Provided method: calls evaluate(), unwraps Complete, or panics.
/// Convenience for callers who know they have enough data.
fn analyze(&self, data: &DifferentialSet) -> OracleResult;
}SampleDecision
pub enum SampleDecision {
/// Analysis complete. Contains the final OracleResult.
Complete(Box<OracleResult>),
/// More samples needed. The scheduler should collect additional exchanges
/// and call evaluate() again.
NeedMore,
}The adaptive sampling loop: the scheduler calls evaluate() after each sample round. If the analyzer returns NeedMore, the scheduler executes another probe pair and appends the exchanges to the DifferentialSet. This continues until Complete or a max-sample ceiling is hit.
ExistenceAnalyzer
The concrete implementation for the Existence oracle class. This is currently the only analyzer.
pub struct ExistenceAnalyzer;Sampling logic in evaluate():
- Compare
baseline[0].statusvsprobe[0].status - If identical: short-circuit to
Completeafter 1 sample viabuild_result. When no body/header signals exist, this producesNotPresent. When signals are present (e.g. same status code but different response bodies), the full classification pipeline runs and can produceLikelyorConfirmed - If different and < 3 samples: return
NeedMore - If different and >= 3 samples: check consistency (all baseline statuses match, all probe statuses match)
- Stable: run full classification pipeline
- Unstable: return
NotPresentwith an instability annotation
Internal Architecture
Three-Layer Scoring Pipeline
The classification pipeline in existence/classifier.rs composes three layers:
Layer 1: Pattern Table Lookup (existence/patterns.rs)
A static lookup table mapping (baseline_status, probe_status) pairs to base confidence, base impact, labels, leak descriptions, and RFC basis strings. Patterns are grouped by confidence tier: strong (base_confidence >= 85), upper-moderate (82–84), lower-moderate (80–82), and weak (< 80). This is the coarsest signal — a status code differential alone provides base evidence.
Layer 2: Signal-Weighted Scoring (existence/signal_weights.rs, existence/families.rs)
Four signal extractors run unconditionally on every DifferentialSet:
| Extractor | Module | What it detects |
|---|---|---|
| Status code | signals/status_code.rs | Status code differential between baseline/probe |
| Header | signals/header.rs | Header presence differential, header value differential |
| Metadata | signals/metadata.rs | Metadata leaks (Content-Range size, ETag values) |
| Body | signals/body.rs | Body content differential, content-type mismatch |
Each signal is weighted by kind and evidence content:
| Signal | Raw Confidence | Raw Impact | Family |
|---|---|---|---|
Content-Range header presence | 12 | 8 | Range |
ETag header presence | 10 | 5 | CacheValidator |
Last-Modified header presence | 8 | 5 | CacheValidator |
WWW-Authenticate header presence | 8 | 8 | Auth |
Accept-Ranges header presence | 5 | 0 | Range |
Allow header presence | 6 | 5 | General |
| Generic header presence | 3 | 0 | General |
| Content-Range size leak | 5 | 15 | Range |
| ETag metadata leak | 3 | 5 | (from evidence) |
| Body content diff | 70 | 15 | ErrorBody |
| Body content-type mismatch | 25 | 10 | ErrorBody |
| Header value diff | 4 | 3 | (from evidence) |
Weights are modified by two multipliers:
- Normative strength: Must/MustNot = 1.0x, Should = 0.9x, May = 0.75x
- Body diff attenuation: When status codes already differ, body diff confidence is reduced to 0.25x (body differences are expected when handlers differ)
Layer 3: Family Adjustment & Verdict Derivation (existence/families.rs, existence/scoring.rs)
Signals from the same RFC mechanism are grouped into families to prevent correlated evidence from inflating scores:
| Family | Signals |
|---|---|
| Range | 206, 416, Content-Range, Accept-Ranges |
| CacheValidator | 304, ETag, Last-Modified |
| Auth | 401, 403, WWW-Authenticate |
| Precondition | 412, If-Match/If-Unmodified-Since |
| Negotiation | 406, Accept |
| ErrorBody | Body content differentials |
| General | Everything else |
Diminishing returns within a family:
- 1st signal: full confidence (capped at 75)
- 2nd signal: 50% confidence
- 3rd+ signal: 0% confidence
- Impact points always count fully regardless of family position
Corroboration bonus for independent signal families:
- 2 families: +3 confidence
- 3 families: +6 confidence
- 4+ families: +8 confidence
Verdict thresholds:
- Confidence >= 80:
Confirmed - Confidence >= 60:
Likely - Below 60:
NotPresent
Severity gating:
Confirmedverdict: severity = impact class directly (High/Medium/Low)Likelyverdict: severity capped one level below impact class (High->Medium, Medium->Low, Low->Low)NotPresent/Inconclusive: no severity
Impact class derivation:
- Size leak present:
High - Metadata signals + impact score >= 40:
Medium - Impact score >= 35:
Medium - Otherwise:
Low
Module Map
parlov-analysis/src/
├── lib.rs # Analyzer trait, SampleDecision, re-exports
├── existence/
│ ├── mod.rs # Re-exports ExistenceAnalyzer
│ ├── analyzer.rs # ExistenceAnalyzer impl, adaptive sampling logic
│ ├── classifier.rs # Three-layer pipeline composition
│ ├── patterns.rs # Static (baseline, probe) -> PatternMatch table
│ ├── scoring.rs # Confidence computation, verdict/severity derivation
│ ├── families.rs # Signal family definitions, diminishing returns
│ └── signal_weights.rs # Per-signal raw confidence/impact weights
└── signals/
├── mod.rs # Test helpers (fake_exchange, diff_set builders)
├── status_code.rs # Status code differential extractor
├── header.rs # Header presence/value differential extractor
├── metadata.rs # Metadata leak extractor
└── body.rs # Body content differential extractorExtension Points
Adding a new oracle class analyzer: Implement the Analyzer trait. The analyzer receives DifferentialSet values and returns SampleDecision. Register it in the binary crate's subcommand dispatch.
Adding a new signal extractor: Create a new module under signals/. The extractor is a function fn extract(data: &DifferentialSet) -> Vec<Signal>. Add the call to extract_all_signals() in existence/analyzer.rs. Add corresponding weights in existence/signal_weights.rs.
Adding a new signal family: Add a variant to the SignalFamily enum in families.rs. Map the relevant headers or status codes to it in header_family() and status_code_family(). Add tests.
Tuning scoring: Pattern base scores live in patterns.rs. Signal weights live in signal_weights.rs. Family caps, diminishing return curves, and corroboration bonuses live in families.rs. Verdict thresholds and severity gating live in scoring.rs. All are constants or simple match arms.