← back to blog

discover holds against SAGE

discover is good because it asks coverage questions. Soft intent framing, multi-select scope sweeps, named alternatives in non-goals. Half of those questions have no downstream parser; they exist for the operator’s own framing. They’re what keep the FRD on the ask you actually made.

We A/B’d it against an adaptation of SAGE-Agent (arXiv 2511.08798), which replaces the question tree with EVPI scoring over a small candidate set and a principled stop signal. 16 cases stratified across task type × codebase familiarity, two blind LLM judges (Opus + Sonnet), pre-registered 7-dimension rubric. Inter-judge Pearson 0.696, exact-match 81%, within-one 100%.

Scores (joint, mean across both judges, out of 21). The W/L/D column is the variant’s head-to-head record against discover across all 64 pair-comparisons per dimension (8 FRDs × 8 FRDs):

DimensiondiscoverSAGE variantvariant vs discover W/L/DEffect
D1 Completeness3.002.750 / 24 / 40medium
D2 Specificity2.882.696 / 21 / 37small
D3 No-impl-leak2.382.5631 / 16 / 17small (variant)
D4 Anti-rescoping2.812.002 / 51 / 11large
D5 Consistency2.942.947 / 7 / 50tie
D6 Auditability2.562.1915 / 33 / 16small
D7 Actionability2.692.4416 / 29 / 19small
Total19.2617.5777 / 181 / 190 of 448discover +1.69

Six dimensions favor discover, one ties, one (D3) favors the variant. The deciding dimension is D4, anti-rescoping discipline, by a large effect. EVPI’s turn-saving mechanism skips exactly the comprehensive scope-coverage sweeps that drive D4. Optimizing for “smallest question set to disambiguate intent” turns out to be the wrong objective when the artifact is a requirements document; the right objective is “did we consider what I’m leaving out?”

We’re keeping discover. The variant’s candidate-shape framing and Belief Trace audit block are worth porting back as a smaller change, without the EVPI scoring that suppressed the sweeps in the first place.