What is a good Deflated Sharpe Ratio?

A DSR of 0.95 or above is commonly treated as statistically significant: roughly a 95% probability the result is not an artifact of multiple testing. Values below 0.90 suggest the backtest may be overfit.

Backtest diagnostics

Is your Sharpe ratio real, or did you just test enough times?

Run enough variants and a worthless strategy will eventually post a great Sharpe by luck. This tool discounts that selection effect and returns the Deflated Sharpe Ratio — the probability your backtest is genuine. Method: Bailey & López de Prado (2014).

The instrument

inputs

1Your strategy

Sharpe ratio (annualized)

Observations (n)

Frequency

Skewness (γ₃)

Kurtosis (γ₄, normal=3)

2Multiple testing

Variants tested (N) — configs, parameter sets, ideas you tried

Trial Sharpe ratios (annualized, optional) — paste all variant Sharpes for an exact deflation

If left blank, the trial spread is approximated from sampling error and flagged as an estimate. Pasting the real list is strongly preferred.

Benchmark Sharpe (annualized) — the bar to beat

3Confidence

Target confidence for track-record length

The verdict

awaiting input

Enter a strategy and a variant count, then run the diagnostic to see whether the result clears the luck line.

How to read this

A backtest reports one number — the Sharpe ratio of the strategy you decided to keep. What it usually hides is how many strategies you didn't keep. The more variants you test, the higher the Sharpe you should expect to find by chance alone, even with no real edge.

Probabilistic Sharpe (PSR)

The probability your Sharpe is truly above the benchmark, given track length and non-normal returns. A first sanity check, ignoring multiple testing.

The luck line

The Sharpe an unskilled strategy is expected to reach after N tries. This is the bar your result has to clear to mean anything.

Deflated Sharpe (DSR)

PSR measured against the luck line instead of zero. This is the honest probability that your edge is real, not the best of many guesses.

A DSR at or above 0.95 is the usual line for "statistically credible." Below 0.90, treat the result as probably overfit and either gather more out-of-sample data or test fewer, better-motivated ideas. For the full derivation and worked examples, see the backtest overfitting guide.

Questions

What is the Deflated Sharpe Ratio?

It is the probability that a strategy's Sharpe ratio is genuinely above the benchmark after correcting for the number of variants tested, the length of the record, and skew and kurtosis in the returns. A high Sharpe is cheap to find when you try many variants; the DSR prices that in.

How many backtests is too many?

There is no single number. Each additional variant raises the luck line. Ten well-motivated tests carry a far lower penalty than ten thousand brute-forced parameter combinations. The tool converts your count directly into the luck line so you can see the cost.

I don't know how many variants I tested.

Count every parameter set, threshold, and idea you evaluated on the same data — including the ones you discarded. If you optimized over a grid, the number of grid points is your N. When unsure, use the larger estimate; it is the conservative choice.

Is this investment advice?

No. It is a statistical diagnostic for evaluating a backtest figure. It does not recommend any security, strategy, or trade, and says nothing about whether a strategy will work in the future.