The validation gauntlet
This is the module the whole site is built around. A backtest that looks good proves almost nothing — anyone can find a rule that fit the past. These are the five checks that separate a real edge from a lucky-looking curve, and the order to run them in.
- Overfitting — fitting the noise instead of the signal
- In-sample vs out-of-sample — and walk-forward testing
- Multiple testing — why a big search makes great results suspicious
- The Deflated Sharpe Ratio and trading costs
Overfitting: fitting the noise
Price history is part signal (real, repeatable structure) and mostly noise (random wiggles that will never repeat). Overfitting is tuning a strategy so tightly that it memorises the noise. It scores beautifully on the past and tells you nothing about the future.
The signature of overfitting: a spectacular backtest Sharpe (say 3+) that collapses to mediocrity (0.4) the moment it meets new data. If a result looks too good, it usually is.
In-sample, out-of-sample, walk-forward
The first defence is simple discipline: never judge a strategy on the data you tuned it on. Split your history in two.
Tune all you want on the in-sample data. Then run once on the out-of-sample data the strategy has never seen. If the edge survives, it might be real; if it evaporates, you overfit. Walk-forward testing repeats this honestly across time: fit on a window, test on the next, roll forward, repeat — so the verdict isn't a single lucky split.
Multiple testing: the lottery problem
Here's the trap that catches almost everyone. Try one strategy and a great result means something. Try a thousand and keep the best, and a great result means nothing — with that many tries, one is bound to look brilliant by pure chance, the way someone wins the lottery even though any single ticket won't.
Every parameter you sweep, every variant you eyeball and discard, counts as a try. The more you searched, the higher the bar your "winner" must clear to be believed. This is the single most underrated reason backtests fail live.
The Deflated Sharpe Ratio
So how do you put a number on "I tried a lot of variants, how impressed should I be?" The Deflated Sharpe Ratio (DSR), from Bailey & López de Prado, does exactly that: it discounts a Sharpe ratio for how many configurations you tested. A Sharpe of 2 from a single idea is impressive; the same 2 as the best of 500 tries might be worth almost nothing, and the DSR makes that explicit.
Costs: gross is a fantasy
The last gate is the most concrete. Every trade pays commissions and loses a little to the spread and slippage (the gap between the price you wanted and the price you got). A backtest that ignores them reports a gross return that no one can actually capture; the net return after costs is the only one that matters.
Many real edges are smaller than their costs — genuinely present, and still unprofitable. The faster a strategy trades, the more costs eat. A high-frequency idea can have a real edge and lose money on friction alone.
The gauntlet, in order
1. Out-of-sample (does it survive unseen data?) → 2. Costs (does it survive net of friction?) → 3. Multiple-testing / DSR (is it real after the search?) → 4. Survivorship & look-ahead (was the data honest?). A strategy worth trading clears all four. Most clear none.
This module is the spine of everything here. For a deeper, worked treatment with more examples, read the companion guide to backtest overfitting — then watch the gauntlet run on real published strategies in the teardowns.
- Overfitting
- Tuning a strategy so tightly it captures noise, not signal.
- Out-of-sample
- Data held back and never used for tuning, used once to judge honestly.
- Walk-forward
- Repeatedly fitting on one window and testing on the next, rolling through time.
- Multiple testing
- Trying many variants, which inflates the best result by chance.
- Deflated Sharpe Ratio
- A Sharpe discounted for the number of variants tried.
- Net return
- Return after commissions, spread, and slippage — the only one you can keep.
Where to go next
You can now tell a real edge from a lucky one — the rarest skill in retail trading. Module 6 (coming soon) turns to staying alive: risk of ruin, position sizing math, and the gap between a backtest and a human holding it.
Run the gauntlet yourself — free
ToolDeflated Sharpe Ratio — discount your Sharpe for the number of variants you tried Tool
Net-vs-Gross Costs — find the break-even cost your strategy must clear Deep-dive guide
Backtest overfitting — the longer, worked companion to this module