Learn · Module 3

Where data lies to you

Before you write a single rule, your data can already be cheating. Three quiet biases — wrong prices, missing losers, and peeking at the future — inflate a backtest so smoothly you'll never feel it happen. Here's how to spot each one.

What you'll learn

OHLCV — what a price bar actually contains
Adjusted prices — why raw prices show crashes that never happened
Survivorship bias — the missing losers that flatter every backtest
Look-ahead bias — using information you couldn't have had yet

Module 3 of 8 · ~9 min read · no math

The raw material: OHLCV

Almost every backtest is built on OHLCV bars — for each period (a day, an hour, a minute), five numbers: the Open, the High, the Low, the Close, and the Volume traded. That's it. The close is the headline price; the high and low show the range; volume shows how much changed hands. Simple — and quietly full of traps.

Adjusted prices: the crash that never happened

A company does a 2-for-1 stock split: every share becomes two, each worth half as much. Nothing about the business changed, but the raw price drops from $200 to $100 overnight. A naive backtest sees a −50% day and either panics or books a fake loss.

The fix is adjusted prices (or "back-adjusted"). They rewrite history so splits and dividends don't show up as phantom jumps — the series reflects what a buy-and-hold investor actually experienced. Always backtest on adjusted prices. Most data libraries can hand them to you directly; just make sure you asked for them.

Quick rule. If a stock shows a clean −50%, −67%, or −80% single-day move years ago, suspect a split before you suspect a crash. Real crashes are messy; splits are suspiciously round.

Survivorship bias: the missing graveyard

This is the big one, and it's almost invisible. Suppose you test a strategy on "today's S&P 500." Sounds reasonable — except every company in that list is one that survived to today. The ones that went bankrupt, got delisted, or were swallowed in a fire-sale merger are gone from your dataset entirely.

Your backtest only ever sees the blue survivors. Every company that failed has quietly fallen out of the dataset — so the past looks far safer than it was.

The effect is brutal for almost any long strategy, and worst for the riskiest ones. A momentum system that buys hot small-caps looks brilliant when the small-caps that blew up to zero have been deleted from history. The strategy never "experiences" the disasters it would have walked straight into. The fix is a point-in-time dataset that includes delisted names — expensive and rare, which is exactly why so many published backtests quietly skip it.

We hit this in a real teardown. Our momentum stress test was built on current index members, so even its honest numbers are an optimistic upper bound — we flagged it loudly rather than pretend otherwise.

Look-ahead bias: peeking at the future

Look-ahead bias is using information at a moment you couldn't actually have had it. It's the most embarrassing bug in quant because it's so easy to write by accident and it makes a strategy look amazing.

Classic ways to peek:

Using a day's close to decide a trade you place at that day's open — you traded on a price that hadn't printed yet.
Normalising the whole series by its full-period average or maximum — those values secretly contain the future.
Using restated earnings or index membership "as known today," not as it was reported back then.

The tell is a backtest that looks too good — an equity curve that glides up with tiny drawdowns and an implausible Sharpe. When a result looks like free money, the first suspect is always look-ahead bias.

Key terms from this module

OHLCV: Open, High, Low, Close, Volume — the five numbers in a price bar.
Adjusted price: Price rewritten so splits and dividends don't appear as phantom jumps.
Survivorship bias: Testing only on companies that survived, so the failures are invisible.
Look-ahead bias: Using information you couldn't have had at the moment of the decision.
Point-in-time data: A dataset that reflects exactly what was known on each historical date, delisted names included.

Where to go next

Clean data in hand, you're ready to build something. Module 4 assembles your first real strategy — signal, entry, exit, and size — with a worked moving-average example you can follow end to end.

← Previous · Module 2

Reading the scoreboard

Next · Module 4

Your first backtest →

See a bias caught in the wild

Teardown
How we quantified survivorship bias in a momentum strategy — and why it makes good numbers an upper bound

Educational content, not investment advice. This lesson explains concepts and methods only. Nothing here recommends any security, strategy, or trade, or promises any outcome. Trading involves risk of loss. See the disclaimer.