Strategy testing: is it enough?

Hey everyone,

I wanted to touch on a topic that I don’t think is discussed nearly enough here, and that topic is backtesting. How reliable is it really?

Most people would assume that backtest results are solid. You get a backtest with a 74% success rate, and you think you've won the lottery! However, there are some grey areas when it comes to backtesting. In fact, backtesting should only be the first step in multiple phases one should go through to ensure a strategy is indeed profitable.

First, let’s dispel some myths about accuracy vs. profitability.

High accuracy = high profitability?
This is false. A high accuracy does not always mean profitability. The considerations that must go into this fact are:

- At what point are you taking profits?
If a buy signal occurs and you take profits at about 0.50 cents from the buy signal, then this is not a feasible strategy or one with a great risk-reward (R:R) ratio.

- How long are you holding?
If the strategy has high accuracy but requires you to hold for 2 to 3 years before seeing profits, then this defeats the purpose of most trading strategies, as this is simply an investment strategy, which, in itself, is a solid approach.

These are two common issues I see in strategies that lead to misleading “accuracy” results.

Low accuracy = not profitable.
This is false. Low accuracy strategies tend to be the best strategies because the focus of these strategies is usually on holding for major targets, with strict stop-loss parameters. You will be profitable infrequently, but when you win, you will win big.

A real-life example of this would be Michael Burry’s successful short. While his successful short became the story of books and movies, his multiple failed attempts at making major shorts before and after this trade have been overshadowed by his success in the 2008 bubble short. Thus, Michael Burry has a low accuracy but a high profitability factor.

How can we better decide on successful strategies?
This is the question that any day or swing trader should be asking: How do we validate the efficacy or efficiency of our strategy? This is where things get somewhat complicated. The emphasis I see in the trading community is on just general accuracy and profit factor. I also see some discussions on Sharpe ratios. I think it’s important to understand these concepts before we continue.

Accuracy: Accuracy is simply the number of successful trades over the total number of trades, multiplied by 100. So, 49 successful trades out of 50 total trades would equal an accuracy of 98%.

Profit factor: Profit factor is the total gross profits divided by the total gross losses over the course of the strategy testing period. For example, if over the last 4 weeks, you made $800 and lost $250, your profit factor would be 800/250 = 3.2.

Sharpe Ratio: Sharpe ratios are slightly more complex. This ratio attempts to evaluate the risk-adjusted return of an investment/portfolio or trading strategy. It works by taking the average return of the strategy/portfolio or investment and subtracting the risk-free rate. The risk-free rate can be something like government bills or a simple high-interest savings rate. Then, you take the remaining value and divide it by the standard deviation of the investment/portfolio or strategy profits.

For example, let’s say your strategy generally yields 10%. The risk-free rate of a high-interest savings account is 2%. The standard deviation of your profit strategy is around 15% (this would be calculated by taking all of your returns from your strategy, both positive and negative, and calculating the standard deviation). In this case, the Sharpe ratio would equal 0.53. An excellent Sharpe ratio is >2. A Sharpe ratio <2 but >1 is considered good. The average Sharpe ratio for most returns is <1 and is more realistic.

TradingView’s strategy tester actually provides you with a calculation of the Sharpe ratio. Simply apply a strategy to your chart and head over to the “performance summary” tab:
snapshot

In general, you should treat any Sharpe ratio >1 with extreme skepticism.

So, are these approaches enough to determine how successful a strategy will be?
No, absolutely not. Even with a good Sharpe ratio, an okay accuracy, and a high profit factor, you cannot be guaranteed that the strategy will be successful.

Why not?
This is a complex question, and I think it’s best answered from a biostatistics approach (mostly because this is my field, haha).

In biostatistics and epidemiology, we have something that can be closely linked to stocks. It's called a “web of causation.” What this means is there are numerous factors that influence a person’s health, and it is very challenging to control and account for all these factors.

Take a make-believe person, Mrs. Jones and her family. At first glance, Mrs. Jones and her family may appear well-dressed, affluent, well-groomed, and healthy. Now, let’s say we want to trade based on Mrs. and Mr. Jones’ likelihood of living to 80 years old (we are playing the insurance actuary’s job now, haha). The only information we have on this family is that they appear affluent, show no signs of illness, and they are pleasant people.

Believe it or not, this is about all the information we have at a single point in time on a stock. That’s all we can really know at the time of trade execution. We can speculate further, but we can’t really know all of the impacting factors on the stock.

Now, let’s say we buy calls on the Jones family living to 80 based on what we observe. Now, 12 years have passed, and Mr. Jones ends up ill and in the hospital. Two months later, he sadly passes away. Then, 1.5 years after that, Mrs. Jones sadly passes away from cancer.
Your position is now worthless.

What happened?
We ignored and were not able to view the full picture. The Jones family had a lower socioeconomic status. Mr. Jones liked to drink over 4 alcoholic drinks per day. They lived in an older home that did not have sufficient insulation and protection from the elements. They also lived beneath a power grid distribution zone and right next to a high EMF emitting cellphone tower that was constructed right after the family moved in 11 years ago. Mrs. Jones’ family had all died 2 years ago, before the age of 68 from cancer, and Mr. Jones’ family had a history of health issues and alcoholism.

We can visualize a web of causation through this image:
snapshot

Some of these things we could have found out, namely the socioeconomic status and Mr. Jones’ history of alcoholism. However, most of these things did not appear until midway through our bet. For example, at the time, we did not know that they would build a high EMF emitting tower right next to their house, and Mrs. Jones’ family did not die until 8 years into our position.

So how could we have known?
The truth is, we couldn’t have. It’s impossible! We could have done better due diligence by obtaining the current and most recent family history and socioeconomic situation. We could have obtained information on the location and house the family was living in. But most of these things happened along the way, and it would have been impossible to foresee them.

This is the reality of stock trading. The issue with stocks is that it is impossible to know what the future holds for a company or the economy. The stock market has a multifaceted web of causations, such as the current economic status of a country, global affairs, war, presidency, a company’s overall financial stability, unexpected lawsuits, unexpected losses, bankruptcies, interest rates, and other economic disasters.

Here’s what a web of causation could look like for the stock market:
snapshot

So, what can we do?
Here are some tips for ensuring that we capture the most accurate picture we can of a strategy. We’ll start with some easy, quick-to-implement approaches and then go into some more advanced, higher-level approaches.

Easier approaches:

- Ensure you utilize a larger lookback period. TradingView has the ability to do what is called “deep backtesting.” This allows you to backtest a strategy from many weeks, months, and years in the past. Make use of this function! One of the biggest issues with strategy backtesting is focusing on a limited lookback period. This introduces bias and omits a vast amount of data.
snapshot

- Analyze the statistics presented in TradingView’s backtester performance summary. Be very skeptical of Sharpe ratios >= 1.2 and profit factors >= 1.5. Make sure you look at the entries and exits of the strategy, and the average trade length and profit:
snapshot

- Warning signs to look for are an abnormally long period of time in a trade (be sure it’s proportionate to the timeframe you are on—for example, 150 bars on the daily is almost a year!) and frequent trades with marginal profits.

Advanced Approaches:

Most quantitative traders and financial institutions apply something called forward testing. Forward testing includes a number of statistical tests that can determine whether the results of the backtest are statistically significant. For example, applying a simple Chi-Square test can determine whether there is a statistically significant difference between the number of winning trades and losing trades. A t-test can be applied to a bond/fixed interest rate account performance and your strategy to compare whether there is a statistically significant difference between the profits yielded by your strategy vs. a safe investment or high-interest savings position.

These can be accomplished in Python, R, Excel, or even Pine Script (using my SPTS library, which gives you the ability to calculate a paired and one-tailed t-test right within Pine Script). The details on how to do this are higher level and beyond the scope of this article, but I will continue the series on backtesting/forward testing into the future with some examples of how one can forward test within Pine Script and Excel.

Another method is by omitting future data points, testing the strategy's success over a specified period, and then executing it on the future points to see if the results compare. If you notice a marked difference between the previous period and the forward period, this should signal alarm bells. For example:
snapshot

The above chart shows the difference that can happen due to changing sentiments and economic circumstances, and that a strategy can be inconsistent and contingent on external factors beyond our knowledge or control.

Conclusion
And that’s it! This will mark my first educational article of 2025! Hopefully, you learned something and take this to apply to your trading. Be careful, and as always, safe trades, everyone!

Technical Indicators

For real-time updates and premium indicators, consider joining my group at: patreon.com/steversteves
Also on:

Disclaimer