Where data lies to you
Before you write a single rule, your data can already be cheating. Three quiet biases — wrong prices, missing losers, and peeking at the future — inflate a backtest so smoothly you'll never feel it happen. Here's how to spot each one.
- OHLCV — what a price bar actually contains
- Adjusted prices — why raw prices show crashes that never happened
- Survivorship bias — the missing losers that flatter every backtest
- Look-ahead bias — using information you couldn't have had yet
The raw material: OHLCV
Almost every backtest is built on OHLCV bars — for each period (a day, an hour, a minute), five numbers: the Open, the High, the Low, the Close, and the Volume traded. That's it. The close is the headline price; the high and low show the range; volume shows how much changed hands. Simple — and quietly full of traps.
Adjusted prices: the crash that never happened
A company does a 2-for-1 stock split: every share becomes two, each worth half as much. Nothing about the business changed, but the raw price drops from $200 to $100 overnight. A naive backtest sees a −50% day and either panics or books a fake loss.
The fix is adjusted prices (or "back-adjusted"). They rewrite history so splits and dividends don't show up as phantom jumps — the series reflects what a buy-and-hold investor actually experienced. Always backtest on adjusted prices. Most data libraries can hand them to you directly; just make sure you asked for them.
Survivorship bias: the missing graveyard
This is the big one, and it's almost invisible. Suppose you test a strategy on "today's S&P 500." Sounds reasonable — except every company in that list is one that survived to today. The ones that went bankrupt, got delisted, or were swallowed in a fire-sale merger are gone from your dataset entirely.
The effect is brutal for almost any long strategy, and worst for the riskiest ones. A momentum system that buys hot small-caps looks brilliant when the small-caps that blew up to zero have been deleted from history. The strategy never "experiences" the disasters it would have walked straight into. The fix is a point-in-time dataset that includes delisted names — expensive and rare, which is exactly why so many published backtests quietly skip it.
We hit this in a real teardown. Our momentum stress test was built on current index members, so even its honest numbers are an optimistic upper bound — we flagged it loudly rather than pretend otherwise.
Look-ahead bias: peeking at the future
Look-ahead bias is using information at a moment you couldn't actually have had it. It's the most embarrassing bug in quant because it's so easy to write by accident and it makes a strategy look amazing.
Classic ways to peek:
- Using a day's close to decide a trade you place at that day's open — you traded on a price that hadn't printed yet.
- Normalising the whole series by its full-period average or maximum — those values secretly contain the future.
- Using restated earnings or index membership "as known today," not as it was reported back then.
The tell is a backtest that looks too good — an equity curve that glides up with tiny drawdowns and an implausible Sharpe. When a result looks like free money, the first suspect is always look-ahead bias.
- OHLCV
- Open, High, Low, Close, Volume — the five numbers in a price bar.
- Adjusted price
- Price rewritten so splits and dividends don't appear as phantom jumps.
- Survivorship bias
- Testing only on companies that survived, so the failures are invisible.
- Look-ahead bias
- Using information you couldn't have had at the moment of the decision.
- Point-in-time data
- A dataset that reflects exactly what was known on each historical date, delisted names included.
Where to go next
Clean data in hand, you're ready to build something. Module 4 assembles your first real strategy — signal, entry, exit, and size — with a worked moving-average example you can follow end to end.
See a bias caught in the wild
TeardownHow we quantified survivorship bias in a momentum strategy — and why it makes good numbers an upper bound