---
source: arxiv
url: http://arxiv.org/abs/2603.10478v1
published_at: '2026-03-11T07:05:29'
authors:
- Tim Menzies
- Kishan Kumar Ganguly
topics:
- software-testing
- black-box-optimization
- configuration-tuning
- search-based-software-engineering
- sampling
relevance_score: 0.76
run_id: materialize-outputs
language_code: en
---

# From Verification to Herding: Exploiting Software's Sparsity of Influence

## Summary
This paper argues for shifting software testing/verification from “model first, then prove” to sampling-based **herding**, based on the key premise that software is often controlled by only a small number of critical variables. The authors propose the black-box optimization algorithm **EZR** and claim that it can approach optimal performance on a large set of software engineering tasks using very few samples.

## Problem
- Software verification and validation (**V&V**) is extremely costly, reportedly accounting for **60%** of project effort, and exhaustive verification is becoming increasingly infeasible in the presence of AI components, concurrency, and distributed systems.
- Traditional methods such as symbolic execution, model checking, ASP, and probabilistic programming depend on complex models; the authors argue that “building another model of the software” may itself be more expensive and more brittle than testing directly.
- The key question is: without understanding the system’s full internal mechanism, can we identify the small set of “control variables” that govern system behavior using only limited input/output sampling, and thereby steer the system toward a target state?

## Approach
- The core idea is **sparsity of influence**: although software state spaces are huge, the key variables that actually determine outcomes are often few in number, commonly as few as **under 10**, so exhaustive search is unnecessary.
- The authors generalize testing as a black-box optimization/search problem: instead of proving that “all inputs are correct,” the goal is to find some input configuration that brings the system to an acceptable or even near-ideal target (called **Heaven** in the paper).
- They propose **EZR (Efficient Zero-knowledge Ranker)**: first randomly sample a small number of configurations, rank them by loss relative to the “ideal point,” and split the samples into **BEST** and **REST** groups.
- EZR discretizes each input attribute, looks for attribute ranges that occur frequently in **BEST** and infrequently in **REST**, and uses a simple two-class Bayesian-style score to select rules that best distinguish good outcomes.
- It then uses these high-scoring rules to constrain the next round of sampling, continually “herding” the search toward better regions. Compared with methods like TPE/SMAC, it avoids rebuilding complex models and instead uses incremental updates, making it lighter-weight.

## Results
- The paper’s central experimental result is that on **63 tasks from the MOOT repository**, EZR reaches an average of **90% optimality** (relative to a reference optimum) using only **32 samples**.
- The relationship between sampling budget and performance is: **8 samples → 62%**, **16 samples → 80%**, **32 samples → 90%**, **64 samples → 91%**, **128 samples → 92%**; the authors emphasize a clear diminishing-returns knee at **32**.
- The dataset covers many categories of software engineering tasks, including system configuration, performance tuning, cloud deployment, Scrum planning, risk modeling, and test-case reduction; input dimensionality ranges from **3–88**, to **124**, and even **1000** decision variables (SPL).
- In Table 2, the average Opt% at **N=32** for each task family is approximately: system configuration **91%**, performance tuning **89%**, cloud **92%**, Scrum **88%**, risk **94%**, health **90%**, SPL **85%**, finance **93%**, talent **89%**, security **91%**, sales **87%**, control **95%**, and test reduction **92%**.
- The paper also claims that in comparisons with **SMAC, OPTUNA, DEHB, Random, KPP**, EZR “matches or outperforms” these methods; Figure 1 suggests its statistical performance is usually no worse than SMAC, but the main text does not provide a single unified win-rate table.
- On speed, the authors provide a qualitative/semi-quantitative claim: over **20 runs** on the full dataset, EZR finishes in **minutes**, whereas SMAC requires **days**, but the excerpt does not include more precise absolute timing numbers.

## Link
- [http://arxiv.org/abs/2603.10478v1](http://arxiv.org/abs/2603.10478v1)
