AMA model stability study 2014

  • 23 March 2017

In 2014 we performed a study to understand the factors that influenced how comparable the outputs of Members' AMA models were – read our findings below.


The objective of the Model Stability study was to benchmark the outputs of different operational risk capital models, understand why differences exist and to identify factors which would increase their comparability. Twenty-six ORX Member banks participated in the exercise.

For Credit and Market Risk the Basel committee has performed industry model benchmarking exercises based on hypothetical portfolios. In each case the committee’s focus was on portfolio level risk-weighted asset consistency, and has performed two for the Trading Book and one for the Banking.

The committee concluded that the studies showed that the majority of differences in risk weights and capital requirements were driven by differences in risk, but also concluded that the results showed that “variations also arise from supervisory and bank practice-based idiosyncrasies, and these can result in material discrepancies” (Ingves, 2014), furthermore they stated that, “While it is difficult to be precise on how much scatter is 'too much', the range of bank practice-based variations is uncomfortably wide.” Informal feedback suggested that a variation of at most ±20% around the median is acceptable.

The Basel committee has frequently stressed the need for greater comparability of risk-weighted asset measures between banks, and over time (Ingves, 2014). Comparably is a sensible and reasonable objective for operational risk, but it is first useful to compare the regulatory regimes of the three major risk types, with specific focus on the requirements placed on internal models and the level of regulatory prescription.

Basel requires the following advanced approaches for calculation of regulatory capital; for Market Risk banks are required to use internal models to estimate a 99% value-at-risk over a 10 day horizon which is combined with a specified multiplier, for Credit Risk banks are required to use internal models to estimate the Probability of Default over 1 year through the cycle which used as an input into a specified model. But for operational risk the requirement is arguably less restricting but also more challenging for banks, where they are given freedom to directly estimate a 99.9% value-at-risk over a 1 year horizon. No models or multipliers are specified, with the only requirement that the four elements, historical internal loss data, historical external loss data, Scenario Analysis and Business Environment and Control Factors (BEICF) each contribute to the model. 

This context informed the ORX study. Specifically, it was designed to understand how the following restrictions on internal models impacted the comparability of output:

  • a lower direct estimation percentile of value-at-risk in combination with a multiplier, analogous to the multiplier in Market Risk
  • and greater model specificity analogous to the capital requirement formula for Credit Risk

Study design

By design the current regulatory standards offer great flexibility to banks in how the four operational risk model elements, Internal Data, External Data, Scenario Analysis and BEICF are combined. This has resulted in wide range of practice within the industry. In order to perform a study in which results could be precisely compared the study focussed on the Loss Distribution Approach (LDA) using only internal data.

Each participating bank was provided with a range of units of measure (UOM) which were to be treated as “internal data”, and instructed to be modelled using the LDA element of their production AMA models. UOMs were constructed using homogenous subsets of real loss data taken from the ORX global data base and drawn from significant risk types. The UOMs were chosen to explore the impact on modelling of a number of factors, including: the number of loss events, the size of the institution from which the data was drawn, the number of years of data which the UOM represented and the severity of tail events. As a consequence, taken in totality, the UOMs did not form a coherent portfolio.

Each participant used the same specified frequency model, a Poisson distribution with UOM specific parameter, and fitted severity models using their current production methodology. These were combined using their current production method of convolution to build a model for estimating annual aggregate loss. Precise instructions were provided to remove other variable factors, such as the treatment of diversification, to ensure that the results were as comparable as possible.

Banks were asked to use their fitted models to produce estimates of annual aggregate loss for each UOM at a range of target percentiles between 90% and 99.95%. Participants also provided extensive details of the models which they employed allowing a thorough validation of the study and an analysis of practice


The results shown below provide a subset of full analysis conducted on the study data. For clarity the same unit of measure is used in each figure.

A summary of the findings are that:

  • banks created good models for severity, with divergence seen at the point where the models have to extrapolate beyond loss experience
  • that there is good agreement between annual loss estimates, again diverging at the point beyond loss experience
  • and that aggregate loss estimates are typically conservative in comparison to empirical equivalents
Severity Modelling. The chart shows the spread of severity fits used within the participants results. The y-axis shows percentiles of the severity distribution, and the x-axis shows the corresponding loss


The first finding of the study was the banks created good models for severity. Figure 1 summarises the spread of severity fitting results from all participants for a UOM with 10 years of data. The black represents the lowest 10% of the severity fits, the red line the highest 10% of the severity fits and the grey line shows a 95% confidence bounds around an empirical approximation (Rioux & Klugman, 2006, Toward a Unified Approach to Fitting Loss Models. North American Actuarial Journal, 10(2), pp. 147-153).

Figure 1 demonstrates that good agreement is seen within the data, in this case meaning up to the 90% or the 1 in 10 years point, but divergence is seen beyond it.

Annual aggregate loss distribution


The second finding is that there is good agreement on annual aggregate loss at lower percentiles. Figure 2 shows the spread of annual aggregate loss results from participants for a UOM with 10 years of data. The black line represents the lowest (least conservative) 10%, red represents the highest (most conservative) 10% of the annual losses for a given percentile. The third line represents an empirical estimate* of the annual loss; this line turns from orange to grey at the point at which the empirical estimate may become inaccurate.

*An approximate “empirical” comparison can be constructed by using the empirical severity in the aggregation. Here a straightforward Monte Carlo simulation is used, generating the number of events from a Poisson and the severities from an Empirical CDF. For the ECDF, a simple linear interpolation of log (losses) is used in order to have a smoother result. A limitation of this method is that losses will not be generated in excess of the largest observed amount. So this method is not suitable for extreme percentiles, but for moderate ones it is a sensible approximate comparison for the smaller aggregated quantiles.

Again, the analysis demonstrates that good agreement is seen within the data but divergence is seen beyond it.

Deviation of annual aggregate loss around median. The boxplots show the spread of estimates, the red line indicates ±20% around the median and the green dots show the empirical estimate


The analysis showed that number of years of observations on which a model is built has the greatest influence on how comparable model outputs are at the aggregate loss level. Within the limits of observed data there is good agreement between different models, typically within a deviation of ± 20% around the median estimate, but once models are extrapolating beyond observed data divergence is seen. Figure 3 provides an example of this; it shows that the divergence of models is within ± 20% up to a target percentile of 97.5%, and greater beyond this, and that the estimates are well above the empirical equivalent.

Submissions were grouped into “models” according to their body and tail distributions, ignoring other contributing factors such as attachment point, the method of parameter estimation and special treatment of distributions. The variation within a single “model” is frequently wider than the variation between “models”. An example of this is shown in the table below, where ratio of the lowest to highest aggregate loss at 99.9% is more than 5.

Further detail on Empirical + GPD models for UOM 3

One of the objectives of this study was to understand the factors which affect the comparability of models.  The study results suggest that a naïve restriction of practice would not lead to increased comparability. It showed that divergence within the same combination of body/tail models was greater than between different body/tail models, an effect which is due to the interaction of a number of other factors such as attachment point and parameter estimation method. In order to enforce comparability a restriction of modelling practice would need to specify something which is approaching a standard model.

Recommendations to increase comparability

The evidence is that a restriction to lower direct estimation target would be more effective. A typical relative convergence gain from 99.9% to 95% is factor of 3.


Operational risk modellers have been set an ambitious challenge, particularly given that it is a new discipline and historical data is limited, but internal models have been given few restrictions.

It is difficult to directly compare the regulatory regime of OR with Credit and Market Risk, but it appears that OR could learn from the other risk disciplines in terms of how they achieve comparability and simplicity.

To increase the amount of comparability supervisors could restrict the level (percentile) at which banks are allowed to use their models to estimate aggregate loss, and provide a more mechanical method for inferring the 99.9% capital. This study shows that the typical relative convergence gain in restricting direct estimation to 95% from 99.9% would be a factor of 4. A restriction on the percentile at which banks can use internal models to directly estimate aggregate loss offers other ancillary benefits such as increased compatible with typical estimates used in scenario analysis, and an increase in the use of the model for risk management purposes.

This study has also shown that a naive restriction of practice would not lead to more convergence results. A greater industry understanding of how the elements of LDA models interact is needed before any good practice guidance can be given which could achieve stability and comparability.

Follow up work

Building on this study, we did further work benchmarking the relationship between AMA Capital at different percentiles and wrote a series of papers on good practice in LDA modelling.

Find out more

If you want to know more about this study please get in touch.