The eyes of the world are on Fukushima, Japan, where heroic technicians and military personnel struggle to prevent a nuclear armageddon. A massive earthquake spawned a devastating tsunami that knocked out a nuclear power plant’s cooling system. And here we are.
This reminds us how low-probability high-impact events tend to be correlated. If a country experiences prolonged draught, there is a heightened probability that it will also experience insurrection. If a company’s employees are caught engaging in fraud, there is a heightened probability that the company’s offices will also burn down. In Japan this past week, three low-probability high-impact events coincided: an earthquake, a tusnami and a nuclear catastrophe. In the capital markets, we describe this sort of phenomena by saying that correlations become more extreme in periods of market turmoil.
During the 1980s, a number of Wall Street firms started developing probabilistic measures for their market risk. These would evolve into the value-at-risk (VaR) measures we know today. Following the 1987 stock market crash, however, employees at Bankers Trust became concerned about how their own measure modeled correlations. During the crash, as stock prices plummeted, treasury rates plummeted as well. This is typical behavior in a crash, but it was not reflected in their measure’s correlation matrix. That was derived from a time series analysis of past market moves, but the vast majority of those market moves occurred during periods of relative market tranquility. Consequently the correlation matrix indicated only a modest correlation between treasury rates and stock prices.
Financial markets evolve over time. This means market data from thirty years ago—or even ten years ago—is not reflective of today’s markets. Most probabilistic market risk measures use less than five years of historical market data. That is inadequate, especially in light of the pronounced conditional heteroskedasticity and non-stationarity of financial markets.
This brings us to the topic of stress testing. When “value-at-risk” became a buzz word in the mid 1990s, so did “stress testing”, and the two terms were invariably used in the same sentence, over and over. It became a mantra that I can cite from heart:
Value-at-risk should always be supplemented with stress testing.
Back then, there was never much of a discussion of how to perform stress testing, or what specific shortcomings of VaR measures it might address, or how to perform stress testing so those specific shortcomings were indeed addressed. The mantra was like boilerplate language in some contract. VaR was novel. People were understandably uncertain about the new but admittedly powerful tool. It would have seemed cavalier to rely on just VaR for assessing market risk, hence the mantra.
Stress testing is “what if” analysis. You specify a hypothetical set of events—called “scenarios”—and then assess what the consequences of each might be. For example, you might suppose a severe earthquake spawns a devastating tsunami that knocks out a nuclear power plant’s cooling system. You could then assess the possible consequences as well as reasonable interventions. Of course, this particular scenario benefits from hindsight. We know those events just occurred. In practice, stress testing is used to assess hypothetical events, and one challenge is determining what hypothetical events to assess. With a little imagination, we can think of more plausible events than we have the time or resources to consider.
If stress testing is to supplement a probabilistic risk measure, such as VaR, this simplifies our task. Probabilistic risk measures employ a joint probability distribution for relevant risk factors, which goes beyond considering just one or a handful of plausible scenarios. Conceptually, this considers the entire range of possible scenarios, and it does so on a probability-weighted basis. For this reason, stress testing should not attempt to repeat what is already being done by a probabilistic risk measure. Instead, it should address shortcomings in that risk measure. Due to data constraints, the Achilles heal of most probabilistic risk measures is low-probability high-impact events. When it is used to supplement a probabilistic risk measure, that is what stress testing should assess.
In a financial context, stress testing typically takes one of two forms. One entails identifying a handful of standardized scenarios. These might be historical events, such as the 1998 failure of LTCM or the 2008 market meltdown. Or they can be entirely hypothetical events of particular relevance to a firm’s business or trading activities. Historical events that happened some time ago, such as the 1929 stock market crash, can be difficult to interpret for today’s markets. Do the best you can to create a scenario that captures the essence of what happened. Historical scenarios have the advantage that they are intuitively compelling. They actually happened. People are more likely to diismiss entirely hypothetical events.
With this approach to stress testing, the important thing is to consistently test the same scenarios every day or every week. Calculate the loss a portfolio would incur from each. The trend in those numbers is more important than the specific numbers themselves. If the loss associated with a particular scenario trends upward or spikes, further investigation is warranted.
The other approach to stress testing is to ask what sort of event would have to happen in order for a firm to suffer a massive loss. This approach can identify emerging risks, as a firm enters new markets or starts trading novel instruments. Clearly, firms should have been performing more of this sort of stress testing in the years leading up to the 2008 subprime crisis. To be effective, though, they would have needed to look at stress scenarios for the portfolios of mortgages underlying their CDOs.
This second form of stress testing is more difficult than the first because it is more open-ended. It requires initiative on the part of the investigator. But being open-ended is also a strength. A junior analyst cannot do it. You need a seasoned professional who has lived through a few market upheavals. In the hands of such an individual, it can be a powerful tool, providing both a license and a framework for ferreting out unseen risks.
Stress testing can also be performed on a stand-alone basis in the absence of any sort of probabilistic risk measure. In this context, it is often called scenario analysis. The focus need not be entirely on extreme events. Without a probabilistic risk measure, there may be a need to assess more routine risks as well.
For the third time since the 2008 subprime crisis, the Eurpoean Union is conducting stress tests to assess the solvency of large European banks. Last year’s tests were based on three scenaros:
- a benchmark scenario that assumed current economic growth would continue for two years,
- an adverse scenario assuming two years of worsening economic conditions, and
- a “sovereign shock” scenario that had European government bond values falling substantially.
Results from that round met with skepticism. Regulators required banks to maintain only a modest capital ratio through each scenario, and the scenarios themselves were not particularly stressful. No scenario, for example, assumed a sovereign default. Only seven out of ninety-two banks failed the 2010 test. Of those that passed, several subsequently required government bailouts.
The EU stress tests were viewed—and still are viewed, to some extent—as politically influenced. There is a desire to ease concerns about EU banks’ solvency, and the tests were designed to do that. Unfortunately, risk analyses are often subject to such political influences. Stress tests—especially if they are informal—are particularly susceptible.
For this reason, it is important to standardize stress testing as much as possible. It is difficult for interested parties to interfere with or undermine an established procedure that has been followed meticulously for several years. You also want to firmly exclude interested parties from the selection of scenarios or the conduct of actual tests. If interested parties want to dispute results once they are released, they are more than welcome to do so. Stress test results are not right or wrong. Their purpose is to alert people to potential risks, get them thinking and motivate dialogue. That isn’t going to happen if interested parties can influence the process or suppress results they don’t care for.
Back in the 1970s, the designers of Japan’s Fukushima Daiichi nuclear power plant performed scenario analysis to assess a variety of risks, including tsunami risk. They did not consider a tsunami as big as the one that hit the plant last week. We don’t know how much that tragic decision was due to flawed analysis and how much it was the result of political influences. Whatever the case may be, the people of Japan are suffering the consequences.
Our hearts go out to them.