Methodology

We don’t try to prove your strategy works. We try to kill it.

Most backtests are optimism with a chart. A single historical run tells you what happened once, on one ordering of trades, on data the strategy was built to fit. Strata treats every strategy as guilty until proven robust — three independent attempts to break it, each scored, none optional.

The null hypothesis: beat random, or it isn't edge

Any market with a drift will make a lot of random strategies look smart. So the first question isn't "did it make money?" — it's "did it make more money than strategies with no idea behind them at all?"

During a Builder search, the engine banks the net P&L of up to 5,000 randomly composed strategies run on exactly the same instrument, data, and period as yours. Your strategy gets a percentile rank against that distribution. A strategy at the 95th percentile cleared almost the entire random pile; a strategy at the 60th is statistically hard to tell apart from luck.

This is the cheapest, most brutal filter we have, and it's worth 10% of the Strata Score. Plenty of nice-looking equity curves don't survive it.

Monte Carlo reshuffle: remove the luck of the draw

A backtest hands you one ordering of trades — the historical one. But max drawdown is mostly a property of trade order, and order is partly luck. The same trades in a crueler sequence can blow a drawdown limit the original backtest sailed past.

So the engine resamples your strategy's trade P&Ls with replacement, 1,000 times, and rebuilds the equity curve for each resample. Out of that distribution we report the 5th/50th/95th-percentile Sharpe, the 95th-percentile max drawdown, and the share of resamples that finished profitable.

The 95th-percentile drawdown is the number that matters for funded accounts: the prop-firm validator checks survival across the reshuffles, not just the one lucky path that actually occurred. This component carries 25% of the Strata Score.

Out-of-sample: the only result that counts is on data it never saw

A strategy tuned on six years of data will look brilliant on those six years. That's not insight, that's memorization. The honest test is performance on windows the strategy had no part in choosing.

Strata runs rolling walk-forward analysis: by default, six months of training data, then one month traded strictly out-of-sample, stepping forward window by window across the full history. The result is a live-like distribution — per-window Sharpe and the percentage of out-of-sample windows that were profitable — rather than one flattering aggregate number.

Out-of-sample stability is worth 20% of the Strata Score, and it's where most curve-fit strategies go to die.

What we deliberately don’t model

A methodology page that only lists strengths is marketing. These are the limits:

Order-book depth, queue position, and partial fills. No retail backtesting tool has honest access to that data, so we don't simulate a fiction of it. Fills are next-bar-open with slippage and commission you control — deliberately conservative.
Sub-minute microstructure. The engine is built on 1-minute OHLCV from Databento (CME-sourced); bar sizes build up from there. If your edge lives inside the 1-minute bar, Strata is the wrong tool and we'd rather say so.
The future. Every test on this page is a statement about how a strategy behaved under historical stress, reshuffled and held out in every way we know how. None of it is a forecast. Backtest results do not guarantee future returns.

All of the above rolls up into one number. How the Strata Score is built →

Futures and derivatives carry substantial risk. Backtest results do not guarantee future returns. Strata is software — not investment advice.