All research
7 min readMethodologyMicrostructure

The bar-coarsening discriminator: when a Sharpe of 3 is just stale closes

A reversion edge that looks clean on hourly bars and evaporates on daily bars was never reversion — it was bid-ask bounce in the close. The cheapest test we have for telling the two apart, and the candidates it has killed.

A cross-instrument reversion or cointegration backtest on hourly closes is one of the easier things in this business to make look good. Build a spread between two related instruments, z-score it on a rolling window, trade the extremes, subtract a cost. On crypto majors the result is often a clean, positive, net-of-cost Sharpe — the kind of number that, if you stopped there, would walk straight into a strategy doc. Before it does, resample the same data to four-hour and one-day bars and run it again, unchanged otherwise. If the edge holds, you may have something. If it collapses, you never did: you were trading the bid-ask bounce in the hourly close.

Why an hourly close lies

A kline close is the last trade — or, on quieter instruments, the last mark — printed before the bar boundary. On anything but the most liquid book that print carries two contaminations. First, bid-ask bounce: the last trade sits at the bid or the ask, not the mid, and which one is effectively a coin flip. Second, staleness: if no trade lands near the boundary, the close is a quote that is minutes old. A spread built from two such closes inherits both contaminations from both legs, independently. The result is a high-frequency wobble that mean-reverts almost perfectly — because it is noise around a slow-moving true relationship, and noise reverts by construction.

A z-score reversion rule loves this wobble. It enters when the spread is two bounces wide and exits when the bounces unwind, and it books the difference as profit. The profit is not real. You cannot sell your own bid-ask bounce; the price you would actually transact at never moved the way the close did. The backtest is measuring an artifact of how the bar was sampled, not a tradeable dislocation.

The test

Coarsen the bar and re-run. Four-hour, then one-day. Plot the net-of-cost result — Sharpe, or better, per-trade basis points — against bar width. Two shapes:

  • Monotonic collapse toward zero. The edge was the sampling artifact. Widening the bar averages the bounce away, and with it the “edge.” This is the overwhelmingly common case on crypto stat-arb.
  • Roughly flat. The per-trade magnitude survives coarsening. The relationship lives in the price, not in the close, and the candidate earns a closer look — cost realism, capacity, walk-forward — none of which this test replaces.

The mechanism in one line: artifacts live at the sampling frequency; real relationships live in the price. Coarsening the bar strips the former and keeps the latter.

Two it has killed

On one venue, a cointegration screen showed an out-of-sample Sharpe near 1.9 at one-hour bars. At one-day bars the same pairs gave 0.4 — a clean tell that the hourly number was stale-close reversion on a slow book. On a second venue, a crypto cointegration line read 3.0 at 1h, 1.1 at 4h, and 0.5 at 1d: a monotonic, roughly geometric decay with bar width, which is exactly the fingerprint of an artifact that halves every time you double the averaging window. Both numbers were, at 1h, good enough to mistake for a result. Neither survived contact with a wider bar.

The test cuts the other way too, which is what makes it a discriminator and not just a wrecking ball. A term-structure relationship we looked at held its per-trade magnitude from 1h through 4h and weakened only mildly at 1d — and that mild weakening had an independent, expected explanation (fewer, longer-held trades at the daily horizon), not the geometric collapse of a bounce artifact. A real relationship degrades gracefully under coarsening; an artifact falls off a cliff.

The engineering pitfall

You have to rebuild the coarse bars from the underlying, not decimate the fine series. Taking every fourth one-hour close to make a “four-hour” series keeps the bounce — you have thrown away three quarters of your data and none of the contamination. Aggregate properly: open from the first sub-bar, high and low as the extremes, close as the last sub-bar's close, volume summed. And note a residual subtlety — a one-day close that you build by taking the last hourly close of the day still inherits that hourly close's staleness. Where the venue serves native daily bars, prefer them; the locally-aggregated version slightly understates the very staleness it is trying to expose. So a candidate that collapses even under aggregation is a conclusive kill, while one that survives aggregation still deserves the native-bar check before you believe it.

What this is not

This is one rail, not the analysis. It is orthogonal to the others and replaces none of them: false-discovery control handles the fact that you tested many pairs; walk-forward folds handle the fact that one historical window can be lucky; a real cost model handles the fact that the spread you assumed is probably wrong. Bar-coarsening handles one specific failure — the sampling artifact — that those three do not catch, because an artifact can be statistically significant, fold-consistent, and survive a generous cost assumption while still being untradeable. A candidate should pass all four. In our experience on crypto relative value, this is the rail that fires first and most often, which is why it is worth running before you spend a day on the others.