053 – Martyn Tinsley – Walk Forward Correlation: A New Tool for Robust Strategy Design!

Test Your Model, Not Its Parameters

 

“Not everything that counts can be counted, and not everything that can be counted counts.”

 

That line, usually pinned to Einstein, fits this article rather well. In trading strategy research, we can spend a long time counting the wrong thing: like, as Martyn Tinsley says – whether the single best in-sample parameter set survives out-of-sample testing. Martyn Tinsley’s novel new approach, Walk Forward Correlation, argues that this is often a comforting illusion. Conversely, the traditional approach can also wrongly lead to throwing a potentially profitable strategy away, just because it fails on one parameter set out-of-sample (OOS). What matters is not whether one lucky setting survives, but whether the entire optimisation surface carries information from in-sample to out-of-sample performance.


The setup

Martyn introduces Walk Forward Correlation (WFC) as a diagnostic for two problems that sit at the heart of systematic trading: identifying over-fitting and genuine structural edge. Traditional walk-forward validation typically optimises a strategy on an in-sample window, picks the “best” parameter set, then tests that one choice out-of-sample. Used the wrong way, there’s a potential flaw here: one parameter set can look good out-of-sample purely by accident – for the statisticians out there, because of statistical variance. That tells you very little about whether the underlying model is genuinely robust.

Tinsley’s move is simple, but useful. Instead of judging one selected point, he looks at all parameter combinations in the optimisation grid and asks a harder question: does strong in-sample performance tend to map to strong out-of-sample performance across the whole space? If yes, you may have something real. If no, you’re probably flattering noise.


There’s a Tool for That!

Martyn’s ‘Opt My Strategy’ (OMS) optimization tool automatically captures your backtests and optimizations into quant-grade dashboards for advanced analysis and decision-making. OMS uses advanced techniques to identify the parameter values with the best genuine, repeatable edge.

Within a few weeks, it will do Walk Forward Correlation & I’m working with him to see if we can use it with RealTest. For up to 25% off (and to support the show!), check it out here: https://algoadvantage.io/toolbox

You’ll also find there the link to his WFC research paper on SSRN.

Martyn is in the Collective, along with 80 other very bright traders & experienced managers. There’s a three-day free trial so it’s worth checking out! The bonus section with Martyn covers his entire 14-step process for robust strategy development!

Members Forum

What WFC actually measures

WFC computes the correlation between in-sample and out-of-sample results across the full parameter grid. The default is the Weighted Pearson correlation, though the paper notes alternatives such as Spearman, Kendall’s tau, and distance correlation. The logic is clean: if the in-sample surface contains genuine predictive information, then parameter combinations that rank well in-sample should, broadly speaking, also rank well out-of-sample, resulting in a higher correlation.

That’s the key distinction.

WFC is not asking, “Did my favourite parameter survive?” It is asking, “Was the optimisation process itself informative?”

A high positive correlation suggests that the strategy’s performance is being driven by something stable enough to survive the train-test split. A low or near-zero correlation suggests that in-sample results tell you almost nothing useful about what happens next. A negative correlation is worse: it hints at instability, inversion, or regime change.

Visually, here’s what we are talking about. What you want to know is whether the result of each parameter combination in-sample, is correlated to the result for the same set out of sample. The two 3d plots would take similar shapes, demonstrating that not just one, but all parameter combinations tended to behave the same way out of sample, as they did in sample.

The WFC approach to optimization validation

This doesn’t help you choose which parameter set you’ll go live with, that’s a different process, this just seeks to tell you whether the model is robust and has not introduced overfitting.


Correlation is not edge

This is where Martyn’s work avoids a common trap. Correlation alone does not prove structural edge. It only proves predictive consistency. For structural edge to exist, you also need positive out-of-sample performance. In other words, a model can be consistently predictive and still be consistently bad. That sounds absurd until you’ve spent enough time around trading systems. Then it sounds normal. Martyn is currently developing the next Walk Forward Correlation paper jointly with a leading university in the UK, and in this they also introduce a supplementary extension to WFC that provides a statistically robust way of determining the strategy’s edge. This will be published in a Quantitative Finance journal so keep an eye out for that.

The original paper’s diagnostic matrix provides a more qualitative approach as follows:

  • High correlation + positive OOS performance: likely structural edge, low over-fitting risk
  • Low correlation + positive OOS performance: probably spurious, over-fit luck
  • High correlation + negative OOS performance: stable, but loss-making
  • Low correlation + negative OOS performance: noise, no edge

That framework is useful because it separates predictability from profitability. Most traders muddle the two.


The Procedure

In sample optimization is performed, then out of sample optimization (or parameter sweeping). The OOS optimization isn’t to discover the optimal set of parameters; it’s performed solely to produce the WFC metrics so that we can measure the correlation of the IS and OOS results. If IS and OOS results are highly correlated, as below, then the assumption is that the model behaviour is robust (has some predictive consistency). Pearson correlation is the default, but Spearman, Kendall’s tau, or distance correlation may also be used.

Example of High WFC

So WFC is seeking to answer, “Does performance in-sample tell us anything meaningful about performance out-of-sample?” It does not by itself tell us if the system is profitable, or what parameters to actually select.

A major theme is that WFC should be read alongside the shape of the optimisation surface.

A smooth topology means nearby parameter values produce similar performance. This supports robustness because the result is not dependent on one fragile parameter combination.

A chaotic topology means small parameter changes cause large performance swings. This suggests instability, over-fitting, or noise.


What this means in practice

What I take from this is straightforward: robust strategy research should evaluate the geometry of the whole parameter space, not just the winner. Smooth surfaces, where small parameter changes produce small performance changes, are more likely to retain alignment between in-sample and out-of-sample results. Chaotic surfaces are usually where over-fitting goes to hide. And remember, this is all about confirming stability of the model, not finding the right parameter set you’ll go live with (although that becomes much easier, and can be performed with much more confidence, after WFC analysis).

WFC does not replace other tools. Martyn explicitly positions it as a complement to methods such as traditional walk-forward analysis, deflated Sharpe ratio, probability of backtest overfitting, White’s Reality Check, and purged cross-validation. But it adds something those methods often miss: a direct read on whether the optimisation surface contains genuine transferable information.


Final takeaway

For traders, the message is sharp. Stop being impressed because one parameter set held up once. Ask whether the surface holds up.

For quants, WFC is useful because it shifts the focus from selection to structure.

For anyone building trading systems, the real lesson is brutally simple: if your in-sample rankings don’t broadly survive out-of-sample across the parameter space, you probably don’t have an edge. You have a souvenir from the past.

Get in Touch with Martyn

Website

X

Linked In