Breeding Attackers: Where Evolutionary Search Wins and Where It Stalls

004 · 2026-04-06 · Fuzzing is an evolutionary attacker. I pointed the same breed-and-select loop at a timing side channel: it matched the standard attack but couldn't beat it. Why breeding attackers wins on some targets and stalls on others.

Fuzzing is one of the most reliable ways to find security bugs: you throw mangled inputs at a program until something crashes, then trace back what did it. Do it the modern way and you're running an evolutionary attacker, even if nobody calls it that. AFL — American Fuzzy Lop, named after a breed of rabbit — is the classic tool. It keeps a queue of inputs, mutates them, and holds onto any mutant that reaches a new corner of the program's code. The rest it throws away.

The trick that made AFL famous is coverage guidance. Older fuzzers mutated inputs blindly and hoped one happened to hit a crash. AFL instruments the program so it can watch which paths through the code each input takes, then rewards any input that reaches somewhere the others never did. That single change turned fuzzing from a lottery into a directed search — and it shook real bugs out of OpenSSL, SQLite, bash, and a long list of image, font, and media parsers. When people say "fuzzer" today, they usually mean something built on this idea.

Its author, Michal Zalewski, called it "a brute-force fuzzer coupled with an exceedingly simple but rock-solid instrumentation-guided genetic algorithm." Mutation plus selection, run against a binary you never have to understand. Nobody files fuzzing under machine learning, but that's what it is: a population of attacks, bred under selection pressure, with no model of the target beyond "did this input do something new?"

This works because the feedback is rich. Code coverage is a dense signal — almost every mutation moves the needle a little, so the fuzzer can always tell a better input from a worse one.

I wanted to see how far that generalizes. Take the same loop — breed candidates, keep the ones that score higher against a black box (a target you can poke from the outside but never open up) — and point it at something harder than code coverage: a cryptographic key leaking through a timing side channel. That's the real test, because this signal isn't dense. It's a cliff.

Here's the target. A server does some cryptography: you send it a number, it raises that number to a secret exponent, and sends back the result. You don't care about the result. You care about how long it took.

Different secret keys take measurably different amounts of time to process — the same way you can tell whether someone typed a 9 or a 1 on a keypad by the length of the pauses between clicks. That's a timing side channel. The secret leaks through the clock.

To score a key guess, you compare the timings it predicts against the timings you measured, and ask how well the two move together. That comparison is a correlation — the Pearson coefficient, taken as an absolute value: 1.0 for a perfect match, near 0 for no relationship. Right key, high correlation. Wrong key, noise.

Breeding attackers against a black box

Strip fuzzing down to its mechanism and you get a general recipe for attacking a black box. Start with a population of candidate attacks. Score each one against the target — any real number will do, however crude. Keep the high scorers. Breed them by mutation and crossover, where crossover just means mixing two good candidates into a new one. Delete the rest. Repeat.

You never need the target's source code, its internals, or a labeled dataset. You need one thing: a score that says which candidate is doing better. That's the adversarial setting exactly. You're prying a secret out of a system that doesn't want to give it up, and no one hands you the answers to check against.

Compare that to how most models learn. You nudge the weights in whatever direction lowers the error, walking downhill along the slope — that's gradient descent. It needs a slope. A black box doesn't give you one: no derivative to read, no hill to walk down.

This is why the classical tools for this kind of problem are the derivative-free ones — the same breed-and-select idea, with no slope to follow. Differential evolution (Storn & Price, 1997) and evolution strategies like CMA-ES (which tunes its own step size as it searches) are population-based: they use only the objective's value, no gradient required. That makes them the standard pick when the objective is a black box, jagged, or noisy — which describes most attacks better than it describes most training problems.

That split is why evolutionary search shows up as an offensive tool and not just a training curiosity. Gradient descent is faster and sharper, but only when two things hold: you own the model, and you can write your goal as a smooth, differentiable loss. Point the same method at someone else's system and both conditions vanish. There are no internals to differentiate, and the only feedback is pass/fail or whatever score the target chooses to hand back.

Breeding attackers asks for exactly that and nothing more — is this attempt better than that one? — so it works against targets gradient descent can't touch. You pay in speed: a population gropes around by trial and error where a gradient would step straight to the answer. But a slow attack that runs beats a fast one that can't start.

Fuzzing is the headline example, but it's worth being precise, because Zalewski is. AFL isn't a textbook genetic algorithm — there's no explicit fitness function, no single score ranking a whole population of candidates from best to worst, just a "keep anything that hits a new state transition" novelty rule. He deliberately contrasts that with greedy genetic fuzzers that evolve a single input to maximize coverage, which in his own tests bought nothing over blind fuzzing. The novelty-preserving version is the one that works. Mutation plus selection, tuned to the shape of the signal.

The same loop shows up well beyond fuzzing. I kept finding it everywhere I looked.

The one-pixel attack (Su, Vargas, Sakurai) flips a single pixel — three or five in their larger tests — to fool an image classifier into mislabeling a picture. That's an adversarial example: a tiny tweak that breaks the model. It works with differential evolution and nothing but the classifier's output probabilities — no gradients, no internals, pure black-box queries. GenAttack (Alzantot et al., 2019) does the same job with a genetic algorithm, against MNIST, CIFAR-10, and ImageNet, from query access to output scores alone.

Malware detection has the same shape. EvadeML (Xu, Qi, Evans, 2016) uses genetic programming to mutate malicious PDFs — insert, delete, replace — with a sandbox oracle confirming each variant still runs its payload. It found an evasive variant for all 500 seeds against the PDFrate and Hidost classifiers. (A separate line of work, MalGAN, evades detectors with a GAN over feature vectors instead — different technique, different failure modes.)

Every one of those has the same thing in common: the score is graded. Fuzzing has edge coverage — mutate an input and you usually gain or lose a measurable amount. One-pixel and GenAttack have confidence scores — nudge a pixel and the classifier's probability shifts by a readable amount. EvadeML has an oracle that keeps confirming maliciousness while the detector's verdict slides. A partial success looks partially successful. Selection pressure has something to grip.

The timing side channel breaks that assumption. Its real objective — how well your predicted timings correlate with the measured ones — isn't graded at all. A key that's one early bit wrong scores the same near-zero as a random guess; get the early bits right and it snaps to a high correlation. Nothing in between. A dense-signal breeder like AFL is climbing a hillside — every step tells you whether you're getting warmer, so you always know which way is up. Here the landscape is a cliff, and a cliff is the one thing selection pressure can't climb. That gap — dense signal versus cliff — is the whole experiment.

The standard attack here is called CPA — correlation power analysis (Brier et al.), applied to timing traces instead of power consumption. You walk through the key one bit at a time. For each bit, you simulate what the timing should look like if that bit is 0, then again if it's 1. Whichever version correlates better with the real measurements wins. Greedy, fast, and it works.

But CPA has a structural weakness. It commits to each bit one at a time, in order. If it gets bit 7 wrong, every later bit is judged in the context of that mistake, and the error compounds. With enough timing measurements the margins are clear and this doesn't matter. But cut the trace budget — a trace is one timing measurement; the budget is how many you're allowed — and you start seeing 60% accuracy on 32-bit keys. CPA's greedy strategy hits a wall.

Could something learn to do better than that greedy walk? Specifically — could you breed a small neural network that finds keys CPA misses? A neural network here is just a small bundle of numeric weights that turns inputs into a guess, and "breed" means search for a set of weights that guesses well.

So I tried a genetic algorithm (GA) — the same breed-and-select loop as the fuzzer, aimed at key recovery instead of code coverage. Evolve a population of neural networks. Each one proposes a key. The ones that produce better timing correlations survive and breed. The rest die. No gradients, no labels, no training data — just selection pressure against a black box.

It didn't work. Or rather — it worked exactly well enough to match CPA, but not to beat it. The why is the part worth writing down. Sparse reward, a huge search space, partial solutions that look like noise — these problems fight back against evolution in ways you can pin down.

The setup

The target runs modular exponentiation — raise a number to a power, then keep the remainder after dividing by a fixed modulus. It's the core math operation behind RSA, the public-key cryptography behind a lot of the secure connections you use. It processes the secret key one bit at a time with the textbook square-and-multiply loop: always square the accumulator, and multiply by the input only when the bit is 1. A 1-bit costs an extra multiply, so it takes longer. Different keys take measurably different amounts of time to process the same input. That's the leak Paul Kocher turned into a working attack on RSA and Diffie-Hellman back in 1996.

The per-bit scores CPA produces along the way — its confidence for 0 vs 1 at each position — are what matter next.

The neural network gets a different view. Instead of raw timing traces, it sees CPA's per-bit scores: for each bit position, how confident CPA is that the bit is 0, and how confident it is that the bit is 1. The network sees all bits at once and can potentially catch patterns CPA misses — cases where the correlations for one bit hint that a different bit was guessed wrong.

The network is small, sized for keys up to 64 bits: 128 inputs (two correlations per bit), 64 sigmoid outputs, and one hidden layer of 32 ReLU neurons in between. ReLU is the standard unit that zeroes out anything negative and passes positives through unchanged. About 6,000 parameters in total. The experiments here only go up to 32-bit keys, so not all of those inputs and outputs are in play — but it's small enough for a GA to search.

Why not just train a model normally

The normal way to use machine learning for this kind of attack is supervised learning — training on examples whose answers you already know. You buy a copy of the target device, set the key yourself, and collect thousands of measurements where you know both the input and the secret. Each measurement gets a label: "this trace came from key 0x3F7A." Train a neural network on those labeled examples, then deploy it against the real target. That's the standard approach — deep-learning side-channel analysis, DLSCA (Zaid et al., Maghrebi et al.). It works well when you can profile a copy of the target.

I can't do that here. There's no copy to profile. The server generates a random key on startup. I never see it. I get timing measurements, not labeled training data. Nobody labels an attacker's data — that's the nature of the problem.

You could try gradient descent anyway — a loss function scores how wrong the network is, and backpropagation (backprop) nudges the weights down that slope, toward less wrong. But the key bits are discrete: you threshold the network output at 0.5 to get 0s and 1s. Rounding to a hard 0 or 1 flattens that slope — thresholding kills gradients. There are workarounds (relaxation tricks that fake gradients through discrete steps), but they run into the cliff from earlier — a random key and a key that's mostly right but wrong on an early bit both sit at ~0 correlation, so there's no slope for backprop to descend.

secret key

1011001110100101

your guess — click to flip

1011001110100101

0/16 wrong

r = 1.000

Try flipping bit 0 (leftmost) — correlation crashes to near zero. Now reset and flip bit 15 (rightmost) — barely moves. Early bits matter more because each wrong guess changes all subsequent computations. That's the cliff: there's no smooth path from wrong to right.

A genetic algorithm doesn't need gradients, and it doesn't need labeled data. It works by trial and error at scale. You start with a population of random candidate networks. You score each one — somehow — and the highest-scoring ones survive to produce offspring (copies with small random mutations). The low-scoring ones get deleted. Repeat. Over generations, the population drifts toward better solutions. Basically evolution, but for neural network weights instead of organisms.

gen 0 — click +1 gen or +20 gen to evolve

A genetic algorithm searching a 2D fitness landscape. Bright = high fitness. Watch the dots converge toward the peak — and notice some getting stuck on the smaller hill.

The catch is that it needs a scoring function — a "fitness function" — that can rank candidates. Not perfectly, just well enough to tell slightly-better from slightly-worse. That turns out to be the entire problem, which I'll get to.

The tradeoff in efficiency is brutal. A GA with 40 individuals per island, 3 islands (independent sub-populations that occasionally swap members), 100 generations evaluates ~12,000 candidate networks. Backprop through the same network would converge in maybe 200 steps. But those 200 backprop steps require a loss function that points downhill, and this one doesn't. Slow and wasteful, but at least it can make progress.

The fitness function problem

This is where I wasted the most time, and where the real work is. The GA mechanics — crossover, mutation, island migration, tournament selection (pick survivors by small head-to-head contests) — are all textbook. You can get that right in an afternoon. The fitness function is the part where you stare at a flat line on a graph for hours and try to figure out why 120 neural networks all have the exact same score.

The obvious approach: for each candidate network, threshold its outputs at 0.5 to get key bits, simulate the timing, compute correlation with real timings. Simple, direct, obviously correct.

Dead on arrival. Every random network produces a random key, and random keys all get ~0 correlation. No way to tell which random network is slightly less terrible than another. No selection pressure. It's like judging a spelling bee where you can only say "wrong" or "perfect" — a contestant who gets 9 out of 10 letters right hears "wrong," same as someone who gets 1 right. You can't rank them. You can't select the better one. The correlation cliff kills you before evolution can start.

So I tried something cleverer: Monte Carlo sampling, which means estimating an answer by drawing lots of random samples and averaging them. Treat each network output as a probability. Sample 32 keys from it — per bit, a weighted coin flip that comes up 1 with the probability the network suggests — and average the correlations. The theory: networks whose probabilities lean slightly toward the right bits should generate slightly better keys on average.

Same problem, different disguise. The CPA baseline key (always included as sample 0) gives ~0.85 correlation. The 31 Bernoulli samples from a random network give ~0 each. Average: 0.85/32 ≈ 0.027. And because that CPA sample sits in every candidate's set, it adds the same constant to all of them — it can't separate one network from another. Strip it out and you're averaging 31 near-zero correlations. Flat landscape either way.

Third try. Forget correlation entirely. Score each bit independently by how well the network agrees with CPA's recommendation. For each bit, CPA gives a margin (how confident it is that the bit is 0 or 1). The network gives a probability. Reward agreement, weighted by CPA's confidence. This has a real per-bit signal — each bit contributes independently, no cliff.

But there's a catch I didn't see coming. Every random network outputs "I don't know" for every bit — probabilities sitting right at 0.5.

The culprit is the sigmoid, the S-curve each output passes through. It squashes any value into a 0-to-1 probability, and near the middle it flattens everything toward 0.5. So two different networks come out looking almost the same. The arithmetic makes the point — you can skip the digits: a weight of 0.03 gives sigmoid(0.015) = 0.504, a weight of -0.03 gives sigmoid(-0.015) = 0.496, a gap of 0.008. Across all bits, the fitness differences land in the fourth decimal place. The GA can't tell the networks apart. It's like trying to rank 40 people by height when they're all standing a mile away.

all networks score ~0 — no selection pressure

Same 40 random networks, two fitness functions. One clusters them at zero. The other spreads them out — giving the GA something to select on.

What actually worked

Two changes, together, got it moving.

Stop squashing the signal. The "I don't know" problem comes from sigmoid compressing everything near 0.5. But the raw values before sigmoid — the logits — do differ between networks. A logit of 0.01 vs -0.01 is invisible after sigmoid, since both map to ~0.5. So score the logits directly instead, passing them through tanh to keep the score in [-1, 1]. Tanh is another S-curve, but near zero it stays roughly linear and four times steeper than sigmoid (slope 1 vs 0.25), so it preserves the small gaps instead of collapsing them. Small fix, big impact.

Give the GA a head start. Instead of starting from random networks, I built a network that already knows CPA's answer. For each bit position, the weights are set so the network computes "is the correlation for bit=1 stronger than for bit=0?" — exactly what CPA does, but expressed as neural network weights. Then I seeded half the initial population with slightly mutated copies of this network.

The CPA-copier network starts at ~0.96 fitness. The GA's job goes from "find a good network from scratch" — which is searching a 6,000-dimensional void — to "explore the neighborhood of a known-good solution." Mutations that improve on CPA's decisions get rewarded. Mutations that corrupt good bits get punished. The search happens somewhere useful.

half seeded from CPA copier (~0.96), half random

Choose init strategy, then evolve. Seeded populations start near the answer and refine. Random ones barely move.

Here's how I encoded CPA into a network. r1 and r0 are the correlation coefficients under the bit=1 and bit=0 hypotheses, and what I actually want out is the signed difference scale * (r1 - r0). Trouble is, the hidden layer is ReLU, so a signed value can't pass straight through — ReLU clips anything negative to zero. The identity z = ReLU(z) - ReLU(-z) gets around that: use a pair of ReLU neurons whose difference reconstructs scale * (r1 - r0) exactly. Positive output means bit=1. Negative means bit=0. The code block below just wires that rule into the weights by hand — safe to skip.

// construct output = scale * (r1 - r0) per bit via ReLU
// h_pos = ReLU(scale * (r1 - r0))  -- fires when CPA says 1
// h_neg = ReLU(scale * (r0 - r1))  -- fires when CPA says 0
// output = h_pos - h_neg

for i in 0..key_bits {
    w1[h_pos][r1_input] =  scale;
    w1[h_pos][r0_input] = -scale;
    w1[h_neg][r0_input] =  scale;
    w1[h_neg][r1_input] = -scale;

    w2[output_i][h_pos] =  1.0;
    w2[output_i][h_neg] = -1.0;
}

Result on 16-bit keys: fitness starts at 0.96 in generation 1, stays above 0.95 throughout, and the attack recovers the full key — 100% accuracy, matching CPA exactly.

Where it stops working

On 16-bit keys, the hybrid matches CPA perfectly — 100% key recovery. But matching CPA was the baseline, not the goal. The goal was to beat it.

On 32-bit keys with a tight trace budget (64 training traces, 3 reps), CPA's greedy approach starts making mistakes — full key about half the time, 53-60% accuracy when it fails. On these hard cases the hybrid nudges ahead: 56% of bits right where CPA gets 53%. But that's bit accuracy, barely above the 50% coin flip, and at that level neither one recovers the full key. Three points of bits doesn't buy a single extra key. It edges CPA on the wrong metric — better bits, same keys.

But on the easy cases — where CPA already gets 100% — the network sometimes corrupts correct bits. And this is the structural problem I couldn't solve. The fitness function rewards "push your logits in the direction CPA suggests." That's a proxy for "produce correct keys." The two objectives overlap most of the time, but they're not the same thing. When the network needs to override CPA on a bit where CPA is wrong, the fitness function actively punishes it for doing so. The GA optimizes the proxy, not the real objective. It finds the gap and exploits it.

Concrete example: suppose CPA is confident that bit 5 is 1, but it's actually 0. Network A outputs 1 (wrong, but agrees with CPA) and gets a fitness bonus. Network B outputs 0 (correct, but disagrees with CPA) and gets penalized. Evolution kills Network B. The right answer gets selected against.

This is basically Goodhart's Law — "when a measure becomes a target, it ceases to be a good measure." You can't use the real objective (correlation) because of the cliff — it gives no signal. You use a proxy instead, and the proxy works well enough to get you to CPA-level performance, but then it becomes the ceiling. The GA can't distinguish "agrees with CPA because CPA is right" from "agrees with CPA because the fitness function said to." Breaking through would require a fitness function that rewards correct keys directly, which brings you back to the cliff.

The three walls

Three walls, none of them specific to my code — they're properties of the problem.

The cliff: you can't use the real objective (correlation) as fitness because partial solutions score the same as random noise. You're forced to use a proxy instead.

The proxy ceiling: the proxy (per-bit agreement with CPA) is learnable, but it tops out at CPA-level performance. The GA can't surpass CPA by optimizing "agree with CPA." To go further you'd need to reward correct keys directly, which puts you back at the cliff.

The signal floor: even a good proxy is useless if every candidate looks identical. Standard neural network initialization through sigmoid produces outputs so uniform the GA can't tell them apart. You have to score the raw values before sigmoid, and seed the population with something good, before selection pressure can work at all.

The biggest single improvement wasn't algorithmic — it was encoding CPA's strategy directly into the initial population. Which raises an uncomfortable question: is the GA actually learning anything, or is it just twitching around a hand-crafted answer? The 56% vs 53% improvement on hard cases suggests it's doing something, but it's a thin margin.

What I'd try next

Two things I haven't tested that might break through the proxy ceiling.

First: two-phase fitness. Use agreement-based training for the first 50 generations to reach CPA-level, then switch to correlation-based fitness. The theory is that once most of the population is near the cliff edge, the correlation signal becomes a ledge instead of a wall — a few networks will land on the right side and get rewarded. This might not work if "near the cliff edge" still means ~0 correlation for everyone. But it's the obvious next thing to try.

Second: selective override at attack time. Don't use the network's output as a complete key. Use CPA's key as the default, and only override bits where the network strongly disagrees (high-confidence logit in the opposite direction). This sidesteps the proxy ceiling by not asking the GA to beat CPA everywhere — just on the bits CPA is least sure about.

So is breeding attackers a viable offensive method? Where the target hands back a graded score, absolutely — it's what a fuzzer does every day, and it works against classifiers and malware detectors for the same reason: selection pressure has a slope to climb. The question I actually asked was narrower — can it climb a cliff? — and the answer is no. The GA here did what GAs do: search, select, breed. It matched CPA and topped out there, because the wall is the fitness function, not the GA. When the real reward is a cliff and partial answers look like noise, everything struggles. Evolution has no special advantage. It just fails differently than gradient descent — slower, but the failures are easier to read.

So the rule I'd carry out of this: before you reach for an evolutionary attacker, look at the reward. If partial success looks partially successful — coverage, a confidence score, an oracle that keeps saying "still works" — breed away; it's the right tool, and it's already proven. If partial success looks identical to noise, a GA won't save you. You need a smoother score, and you may not have one.

The actual question I came away with isn't "GA vs gradient descent" or "evolutionary vs supervised." It's: how do you build a smooth score for a problem where the real answer is pass/fail? That question comes up whether you're doing neuroevolution, reinforcement learning, or anything else that optimizes against a black box. I don't have an answer. But I have a clearer picture of why it's hard, and six fitness functions that don't work to show for it.

References

Kocher, 1996 — Timing attacks on implementations of Diffie-Hellman, RSA, DSS
Zaid et al., 2019 — Methodology for Efficient CNN Architectures in Profiling Attacks
Maghrebi et al., 2016 — Breaking Cryptographic Implementations Using Deep Learning Techniques
Brier et al., 2004 — Correlation power analysis with a leakage model
Zalewski — AFL technical details — coverage-guided fuzzing as an instrumentation-guided genetic algorithm
Su, Vargas, Sakurai, 2019 — One pixel attack for fooling deep neural networks (differential evolution, black-box)
Alzantot et al., 2019 — GenAttack: practical black-box attacks with gradient-free optimization
Xu, Qi, Evans, 2016 — Automatically evading classifiers (genetic programming vs PDF malware detectors)
Storn & Price, 1997 — Differential evolution: derivative-free global optimization