← back

Evolving Side-Channel Attacks with Genetic Algorithms

004 · 2026-04-06 · I tried to breed neural networks that break cryptographic keys from timing leaks. It matched the standard attack but couldn't beat it. The failure modes are the interesting part.

A server does some cryptography. You send it a number, it raises that number to a secret exponent, and sends back the result. But you're not interested in the result. You're interested in how long it took. Because different secret keys produce measurably different runtimes — the same way you can tell someone typed a 9 vs a 1 on a keypad by listening to the pauses between clicks. That's a timing side channel. The secret leaks through the clock. To measure how good a key guess is, you check "correlation" — how closely your predicted timings match the real ones. A correlation of 1.0 means a perfect match. Near 0 means no relationship. If you guess the right key, correlation is high. Wrong key, it's noise.

The standard attack on this — Correlation Power Analysis — is well-understood. You walk through each key bit in order. For each bit, you simulate what the timing should look like if that bit is 0, and again if it's 1. Whichever correlates better with the real measurements wins. Greedy, fast, and it works.

But CPA has a structural weakness. It commits to each bit one at a time, in order. If it gets bit 7 wrong, every subsequent bit is evaluated in the context of that mistake. The error compounds. With enough timing measurements the margins are clear and this doesn't matter, but cut the trace budget and you start seeing 60% accuracy on 32-bit keys. CPA's greedy strategy hits a wall.

I wanted to know: could something learn to do better? Specifically — could you breed a small neural network that finds keys CPA misses? Not with gradient descent. The loss function here isn't differentiable in any useful way. The thing you care about is "does this key produce timing predictions that correlate with the real measurements?" Flip one bit of a 32-bit key and your correlation can drop from 0.85 to nothing. There's no smooth gradient to follow.

So I tried a genetic algorithm. Evolve a population of neural networks. Each one proposes a key. The ones that produce better timing correlations survive and breed. The rest die. No gradients, no labels, no training data — just selection pressure against a black box.

It didn't work. Or rather — it worked exactly well enough to match CPA, but not to beat it. What's interesting is why it got stuck. These kinds of problems — sparse reward, huge search space, partial solutions that look like noise — fight back against evolution in ways you can actually pin down.

The setup

The target runs modular exponentiation — the core math operation behind RSA. It processes the secret key one bit at a time: for each bit, it always squares an accumulator, and if the bit is 1, it also multiplies by the input. More multiplies means more time. Different keys take measurably different amounts of time to process the same input. That's the leak.

CPA exploits this by guessing one bit at a time. For bit 0, it simulates two scenarios: "what would the timing look like if this bit is 0?" and "what if it's 1?" It measures which scenario correlates better with the real timing data, picks the winner, locks it in, and moves to the next bit. Left to right, no backtracking. It's a strong algorithm — fast, well-understood, and it works. But it commits to each guess before seeing the rest. If it gets bit 3 wrong, bits 4 through 31 are all evaluated assuming bit 3 is correct. The mistake propagates.

The neural network gets a different view. Instead of raw timing traces, it sees CPA's per-bit scores: for each bit position, how confident CPA is that the bit is 0, and how confident it is that the bit is 1. The network sees all bits at once and can potentially catch patterns CPA misses — cases where the correlations for one bit hint that a different bit was guessed wrong.

Small network. 128 inputs (two correlations per bit, up to 64 bits), one hidden layer of 32 neurons with ReLU, 64 sigmoid outputs. About 6,000 parameters. Small enough for a GA to search.

Why not just train a model normally

The normal way to use machine learning for this kind of attack is supervised learning. You buy a copy of the target device, set the key yourself, and collect thousands of measurements where you know both the input and the secret. Each measurement gets a label: "this trace came from key 0x3F7A." Train a neural network on those labeled examples, then deploy it against the real target. This is the standard DLSCA approach (Zaid et al., Maghrebi et al.) and it works well when you can profile a copy of the target.

I can't do that here. There's no copy to profile. The server generates a random key on startup. I never see it. I get timing measurements, not labeled training data. Nobody is going to label my data for me — that's the nature of adversarial problems. You're trying to extract a secret from a system that doesn't want to give it up. There's nothing to supervise against.

You could try gradient descent anyway — treat the correlation as a loss function and backpropagate. But the key bits are discrete: you threshold the network output at 0.5 to get 0s and 1s. Thresholding kills gradients. There are workarounds for this (relaxation tricks that fake gradients through discrete steps), but they hit a deeper problem: the loss landscape is a cliff. A random key gets ~0 correlation. A key that's 90% correct might also get ~0 correlation if the wrong bits happen to matter. Then suddenly, at ~95% correct, correlation jumps to 0.85. There's no slope to descend. Just a cliff edge you either fall off or don't.

secret key
1011001110100101
your guess — click to flip
1011001110100101
0/16 wrong
r = 1.000
Try flipping bit 0 (leftmost) — correlation crashes to near zero. Now reset and flip bit 15 (rightmost) — barely moves. Early bits matter more because each wrong guess changes all subsequent computations. That's the cliff: there's no smooth path from wrong to right.

A genetic algorithm doesn't need gradients, and it doesn't need labeled data. It works by trial and error at scale. You start with a population of random candidate networks. You score each one — somehow — and the highest-scoring ones survive to produce offspring (copies with small random mutations). The low-scoring ones get deleted. Repeat. Over generations, the population drifts toward better solutions. Basically evolution, but for neural network weights instead of organisms.

gen 0 — click +1 gen or +20 gen to evolve
A genetic algorithm searching a 2D fitness landscape. Bright = high fitness. Watch the dots converge toward the peak — and notice some getting stuck on the smaller hill.

The catch is that it needs a scoring function — a "fitness function" — that can rank candidates. Not perfectly, just well enough to tell slightly-better from slightly-worse. That turns out to be the entire problem, which I'll get to.

The tradeoff in efficiency is brutal. A GA with 40 individuals per island, 3 islands, 100 generations evaluates ~12,000 candidate networks. Backprop through the same network would converge in maybe 200 steps. But those 200 backprop steps require a loss function that points downhill, and this one doesn't. Slow and wasteful, but at least it can make progress.

The fitness function problem

This is where I wasted the most time, and where the interesting stuff actually is. The GA mechanics — crossover, mutation, island migration, tournament selection — are all textbook. You can get that right in an afternoon. The fitness function is the part where you stare at a flat line on a graph for hours and try to figure out why 120 neural networks all have the exact same score.

The obvious approach: for each candidate network, threshold its outputs at 0.5 to get key bits, simulate the timing, compute correlation with real timings. Simple, direct, obviously correct.

Dead on arrival. Every random network produces a random key, and random keys all get ~0 correlation. No way to tell which random network is slightly less terrible than another. No selection pressure. It's like judging a spelling bee where you can only say "wrong" or "perfect" — a contestant who gets 9 out of 10 letters right hears "wrong," same as someone who gets 1 right. You can't rank them. You can't select the better one. The correlation cliff kills you before evolution can start.

So I tried something more clever: Monte Carlo sampling. Treat the network outputs as probabilities, sample 32 keys from Bernoulli distributions, average the correlations. The theory was that networks whose probabilities lean slightly toward the right bits would generate slightly better keys on average.

Same problem, different disguise. The CPA baseline key (always included as sample 0) gives ~0.85 correlation. The 31 Bernoulli samples from a random network give ~0 each. Average: 0.85/32 = 0.026. Every network gets roughly the same 0.026 plus noise. Flat landscape, again.

Third try, and this is where it got interesting. Forget correlation entirely. Score each bit independently by how well the network agrees with CPA's recommendation. For each bit, CPA gives a margin (how confident it is that the bit is 0 or 1). The network gives a probability. Reward agreement, weighted by CPA's confidence. This has a real per-bit signal — each bit contributes independently, no cliff.

But there's a catch I didn't see coming. Every random network outputs "I don't know" for every bit — probabilities hovering right at 0.5. A random weight of 0.03 times an input of 0.5 gives 0.015 — and sigmoid(0.015) = 0.504. Another network with weight -0.03 gives sigmoid(-0.015) = 0.496. The difference between these two networks is 0.008. Across all bits, the fitness differences land in the fourth decimal place. The GA can't distinguish them. It's like trying to rank 40 people by height when they're all standing a mile away.

all networks score ~0 — no selection pressure
Same 40 random networks, two fitness functions. One clusters them at zero. The other spreads them out — giving the GA something to select on.

What actually worked

Two things in combination.

Stop squashing the signal. The "I don't know" problem happens because sigmoid compresses everything near 0.5. But the raw values before sigmoid — the logits — actually differ between networks. A logit of 0.01 vs -0.01 is invisible after sigmoid (both map to ~0.5), but if you score the raw values directly, using tanh to keep them in [-1, 1], the GA can finally see the differences. Small fix, big impact.

Give the GA a head start. Instead of starting from random networks, I built a network that already knows CPA's answer. For each bit position, the weights are set so the network computes "is the correlation for bit=1 stronger than for bit=0?" — exactly what CPA does, but expressed as neural network weights. Then I seeded half the initial population with slightly mutated copies of this network.

This changed everything. The CPA-copier network starts at ~0.96 fitness. The GA's job goes from "find a good network from scratch" — which is searching a 6,000-dimensional void — to "explore the neighborhood of a known-good solution." Mutations that improve on CPA's decisions get rewarded. Mutations that corrupt good bits get punished. The search happens somewhere useful.

half seeded from CPA copier (~0.96), half random
Choose init strategy, then evolve. Seeded populations start near the answer and refine. Random ones barely move.

The trick for encoding CPA into a network: for each key bit, use a pair of ReLU neurons to compute the difference between the two correlation hypotheses. Positive output means CPA favors bit=1. Negative means bit=0.

// construct output = scale * (r1 - r0) per bit via ReLU
// h_pos = ReLU(scale * (r1 - r0))  -- fires when CPA says 1
// h_neg = ReLU(scale * (r0 - r1))  -- fires when CPA says 0
// output = h_pos - h_neg

for i in 0..key_bits {
    w1[h_pos][r1_input] =  scale;
    w1[h_pos][r0_input] = -scale;
    w1[h_neg][r0_input] =  scale;
    w1[h_neg][r1_input] = -scale;

    w2[output_i][h_pos] =  1.0;
    w2[output_i][h_neg] = -1.0;
}

Result on 16-bit keys: fitness starts at 0.96 in generation 1, stays above 0.95 throughout, and the attack recovers the full key — 100% accuracy, matching CPA exactly.

Where it stops working

On 16-bit keys, the hybrid matches CPA perfectly — 100% key recovery. But matching CPA was the baseline, not the goal. The goal was to beat it.

On 32-bit keys with a tight trace budget (64 training traces, 3 reps), CPA's greedy approach starts making mistakes — full key about half the time, 55-60% accuracy when it fails. The hybrid does edge CPA out on these hard cases. When CPA gets 53% of bits right, the network gets 56%. Consistent small improvement on borderline keys.

But on the easy cases — where CPA already gets 100% — the network sometimes corrupts correct bits. And this is the structural problem I couldn't solve. The fitness function rewards "push your logits in the direction CPA suggests." That's a proxy for "produce correct keys." The two objectives overlap most of the time, but they're not the same thing. When the network needs to override CPA on a bit where CPA is wrong, the fitness function actively punishes it for doing so. The GA optimizes the proxy, not the real objective. It finds the gap and exploits it.

Concrete example: suppose CPA is confident that bit 5 is 1, but it's actually 0. Network A outputs 1 (wrong, but agrees with CPA) and gets a fitness bonus. Network B outputs 0 (correct, but disagrees with CPA) and gets penalized. Evolution kills Network B. The right answer gets selected against.

This is basically Goodhart's Law — "when a measure becomes a target, it ceases to be a good measure." You can't use the real objective (correlation) because of the cliff — it gives no signal. You use a proxy instead, and the proxy works well enough to get you to CPA-level performance, but then it becomes the ceiling. The GA can't distinguish "agrees with CPA because CPA is right" from "agrees with CPA because the fitness function said to." Breaking through would require a fitness function that rewards correct keys directly, which brings you back to the cliff.

The three walls

Looking back, the experiment hit three walls. None of them are specific to my implementation — they're properties of the problem.

The cliff: you can't use the real objective (correlation) as fitness because partial solutions score the same as random noise. You're forced to use a proxy instead.

The proxy ceiling: the proxy (per-bit agreement with CPA) is learnable, but it tops out at CPA-level performance. The GA can't surpass CPA by optimizing "agree with CPA." To go further you'd need to reward correct keys directly, which puts you back at the cliff.

The signal floor: even a good proxy is useless if every candidate looks identical. Standard neural network initialization through sigmoid produces outputs so uniform the GA can't tell them apart. You have to score the raw values before sigmoid, and seed the population with something good, before selection pressure can work at all.

The biggest single improvement wasn't algorithmic — it was encoding CPA's strategy directly into the initial population. Which raises an uncomfortable question: is the GA actually learning anything, or is it just twitching around a hand-crafted answer? The 56% vs 53% improvement on hard cases suggests it's doing something, but it's a thin margin.

What I'd try next

Two things I haven't tested that might break through the proxy ceiling.

First: two-phase fitness. Use agreement-based training for the first 50 generations to reach CPA-level, then switch to correlation-based fitness. The theory is that once most of the population is near the cliff edge, the correlation signal becomes a ledge instead of a wall — a few networks will land on the right side and get rewarded. This might not work if "near the cliff edge" still means ~0 correlation for everyone. But it's the obvious next thing to try.

Second: selective override at attack time. Don't use the network's output as a complete key. Use CPA's key as the default, and only override bits where the network strongly disagrees (high-confidence logit in the opposite direction). This sidesteps the proxy ceiling by not asking the GA to beat CPA everywhere — just on the bits CPA is least sure about.

The broader question — "are GAs a viable adversarial tool?" — I think I asked it wrong. The GA itself is fine. It does what GAs do: search, select, breed. The problem is the fitness function, and that's not a GA problem — it's a problem with the shape of the objective. When the real reward is a cliff and partial answers look like noise, everything struggles. The GA doesn't have a special advantage. It just fails differently than gradient descent — slower, but the failures are easier to understand.

The actual question I came away with isn't "GA vs SGD" or "evolutionary vs supervised." It's: how do you build a smooth score for a problem where the real answer is pass/fail? That question comes up whether you're doing neuroevolution, reinforcement learning, or anything else that optimizes against a black box. I don't have an answer. But I have a clearer picture of why it's hard, and six fitness functions that don't work to show for it.


References