Testing on data you didn't clean is testing on noise

I’ll be honest: the least glamorous part of building a backtest engine is the data. And it’s the part that decides whether your results mean anything.

Here’s what bit us, in order of how much time it cost:

Contract rolls. Futures aren’t one continuous instrument — they’re a chain of contracts (ESH5, ESM5, ESU5…). Stitch them naively and you get a phantom gap on every roll where the price “jumps” from the expiring contract to the next. Your bot sees a breakout that never happened. We resolve the most-liquid contract per window and keep the roll boundaries honest.

Bad prints and thin bars. A single 0-volume bar or a fat-fingered print on an illiquid contract can trigger a signal that’s pure noise. Energy contracts were the worst offenders for us — far less liquid than ES, with real gaps. We’d rather have a slightly smaller, clean dataset than a big dirty one.

Timezone drift. Exchange time, UTC, your broker’s clock, the data vendor’s clock — four ways to be wrong about when a bar happened. Off by an hour and your “9:30 open” strategy is trading 8:30. We normalize everything and keep timestamps tz-aware end to end.

Coverage gaps. “Two years of 1-minute bars” sounds complete until you notice one symbol stops in April and another starts in June. A backtest silently running on half the window will lie to you confidently.

None of this is exciting. All of it changes your numbers. Before you trust a backtest, ask the unglamorous question first: is the data underneath it actually clean? If you can’t answer that, the Sharpe ratio on top of it is decoration.