Strategy teardown

Does EURUSD/GBPUSD pairs trading actually work?

Last time, a triangular-arbitrage system fell apart. Its legitimate cousin is statistical arbitrage — trade the mean-reverting spread between two cointegrated instruments. EURUSD and GBPUSD move together almost in lockstep. So does pairs trading them work? We ran it properly. Correlation, it turns out, is not cointegration.

By ridingyo2026-06-17 Original research · honeybee EURUSD/GBPUSD 1h, 2024–2026

The method. Hedge ratio from an in-sample regression, spread = log(EURUSD) − β·log(GBPUSD), a rolling z-score for the signal, and Optuna over the lookback and the entry/exit thresholds. We hold out the last 40% for out-of-sample, model a per-leg cost, and deflate the result with the Deflated Sharpe Ratio for the 300 trials we ran. Two years of hourly bars (≈12,400).

Correlation is not cointegration

EURUSD and GBPUSD are both "the dollar," so they move together — the hourly return correlation is 0.79. That is what makes people reach for a pairs trade. But a pairs trade does not need correlation; it needs cointegration — a spread that wanders off and is reliably pulled back to a stable mean. Those are not the same thing.

Top: EURUSD and GBPUSD move together. Bottom: their spread drifts and never settles around a mean. — Top: the two pairs track each other (correlation 0.79). Bottom: their spread drifts for months and never settles — an Engle-Granger test gives p = 0.17, so we cannot reject "not cointegrated."

The spread spends 2024 below its in-sample mean and 2025–26 above it. There is no stable level to revert to, which means the central assumption of the trade is missing from the start. The cointegration test is the gate, and this pair does not pass it.

The in-sample mirage

Suppose we ignore the test and optimize anyway — 300 Optuna trials over the lookback and z-score thresholds, scored on in-sample Sharpe. The best run produces a tempting, rising in-sample curve. Then the out-of-sample half begins.

In-sample equity rises then out-of-sample reverses and bleeds lower — The best of 300 trials climbs ~7% in-sample (grey), then reverses and bleeds out-of-sample (blue) — about −2% after costs. The split is the dashed line.

This is the shape of overfitting, not edge. The in-sample Sharpe is about 0.01 — the gain is noise dressed as a trend. The signal does fire: the z-score crosses the bands often enough. It just does not pay, because the spread overshoots and wanders instead of snapping back.

Out-of-sample spread z-score crossing the entry and exit bands but overshooting — Out-of-sample z-score with the optimized bands. Plenty of crossings — but the spread routinely runs to ±4σ before turning, which is a losing trade, not a reverting one.

The Deflated Sharpe puts a number on it

Three hundred parameter trials is three hundred chances to fit noise. The Deflated Sharpe Ratio corrects the headline for exactly that. Feeding the trial count back in:

Metric	In-sample	Out-of-sample
Sharpe (after cost)	0.01	−0.01
Deflated Sharpe (300 trials)	—	0.01
Return	+7%	−2.2%
Break-even cost	—	0.0 bps

A Deflated Sharpe of 0.01 says there was essentially no real edge to find; the in-sample curve is the luckiest of 300 draws from noise. And the break-even cost is zero — the strategy fails to clear even a frictionless market, let alone real spreads.

Out-of-sample Sharpe across cost levels, at or below zero throughout — Out-of-sample Sharpe versus per-trade cost. It starts at zero and only falls — there is no cost low enough to make this profitable.

Verdict

Does it survive validation?

No — and the cointegration test said so up front.

EURUSD and GBPUSD are correlated but not cointegrated, so their spread has no stable mean to trade. The optimized in-sample curve is an overfit mirage (Deflated Sharpe ≈ 0), and out-of-sample the edge is negative even before costs.

The method is right; the pair is wrong

Statistical arbitrage is real and tradeable — but only on instruments that are genuinely cointegrated, not merely correlated. Better hunting grounds: two stocks in the same sector with a shared driver, the front and back contracts of the same future, or a hard-pegged crypto pair. The discipline that matters is the order of operations:

Test for cointegration first. If the spread is not stationary, stop — no amount of threshold tuning will save it.
Hold out time, not rows. A spread that mean-reverts in 2024 can drift all of 2025.
Deflate for the search. Every parameter you try lowers the bar a real edge must clear. Compute it.

Check the math yourself

Tool
Deflated Sharpe Ratio — what does your backtest's headline survive after 300 trials? Guide
Backtest overfitting — why the best in-sample curve is usually the luckiest one

Educational analysis, not investment advice. This is a methodology case study, not a recommendation to trade any strategy or instrument. Simulated and optimized results have severe limitations and do not predict future performance. See the full disclaimer.