There are many industries that use models to help with their decision-making, not just financial services. This section presents views from other industries gleaned by on-site interviews or from the contributors’ personal work experiences.
5.1. Weather Forecasting
5.1.1. On 6 January 2017 this year, Andy Haldane, Chief Economist at the Bank of England, famously likened the collective failure to predict the 2008–2009 financial crash to what he called a “‘Michael Fish moment for economists” (BBC, 2017). To his eternal chagrin, Michael Fish was unfortunate enough to have been on duty as the BBC evening Met Office weather forecaster a few hours ahead of a most dramatic storm event over the southern United Kingdom in October 1987. Haldane’s jibe seems altogether inauspicious for us, as we look to the meteorological and environmental sectors for some lessons to be learned for our immediate needs in the insurance, finance and banking sectors.
5.1.2. And yet, as it happens, in March of just the previous year (1986), the Royal Society and British Academy had convened a joint Symposium on “Predictability in Science and Society” (Mason et al., 1986). It covered the gamut of disciplines, from “Historical Inevitability and Human Agency in Marxism” (Cohen, 1986) to “The Recently Recognized Failure of Predictability in Newtonian Dynamics” (Lighthill, 1986) – and a good deal in between, including “Predictability and Economic Theory” (Sen, 1986), “Application of Control Theory to Macro-economic Models” (Westcott, 1986) and “The Interpretation and Use of Economic Predictions” (Burns, 1986). Unsurprisingly, Sir John Mason (sometime Director of the Met Office) contributed a paper on “Numerical Weather Prediction” (NWP). Significantly, it provides insights into how the use of models in weather forecasting had enabled these forecasts to be improved markedly since the 1960s. In addition, it sets out some principles for gauging and tracking forecasting “skill”, and which principles are still in place.
5.1.3. Making progress: Already in 1986, Mason was able to report on significant progress since the 1960s in NWP. Important for present purposes, some of the progress Mason records is charted in terms of a measure referred to as skill. Thus, we have this:
Although RMS [root mean square] errors and correlation coefficients are useful indicators of the performance of different models for the same area and period, they are only partial indicators of the model’s predictive skill. A better judgement is obtained by comparing the forecasts RMS errors with the long-term climatological variance or with the errors of a persistence (zero-skill) forecast based on persistence (no change) from the initial conditions (Mason, 1986, page 53).
5.1.4. In other words, progress can be gauged according to the improvement in, say, the root mean square (RMS) error of the given forecast relative to that of the naïve forecast of tomorrow’s weather being the same as today’s – that most rudimentary of straight-line, indeed horizontal, extrapolations. In 1984, certain features of the 72-hour-ahead forecast showed RMS errors at just 48% of the naïve (persistence) forecast. These errors had been at the level of 80% 10 years previously. Errors at that 80% level in 1984 were not reached until the 6-day-ahead forecast, “suggesting a gain of three days in predictive skill” (Mason, 1986, page 53).
5.1.5. What lay behind such progress? Unsurprisingly, it was investment in computing power and model “complexity”. A 10-level northern-hemisphere model had been introduced in 1972 and a 15-level global model in 1982. RMS errors were roughly halved over 1972–1984. 72-hour-ahead forecasts in 1986 were as good as the 48-hour forecasts had been 7 years previously and 48-hour-ahead forecasts as good as the previous 24-hour-ahead forecasts.
5.1.6. Mason proceeds to observe that:
Numerical forecasts are unlikely to provide good or useful guidance for the issue of surface weather forecasts if the RMS error exceeds 75% of the persistence error (Mason, 1986, page 54).
5.1.7. From this he goes on to address the matter of the scope for improvements in predictive skill, from which (on the basis of a hypothetical, simulated case study) he concludes:
These figures [numerical details of forecast persistence errors and skill from the case study] suggest that it will, in general, be very difficult to produce useful deterministic forecasts of synoptic-scale developments for more than 14 days ahead … (Mason, 1986, page 58).
5.1.8. Yet the 1986 Symposium was about Predictability (and the “failure of predictability in Newtonian dynamics”, as Lighthill (1986) put it). Thus, it is to this that Mason turns to close his contribution. Acknowledging his question as a rhetorical one, he asks:
[W]ould it be possible to predict the atmospheric evolution from an initial state with infinite precision infinitely far ahead? (Mason, 1986, page 58).
5.1.9. His answer, of course, is “no”, and on two accounts. First, the entirety of the initial state cannot be observed in principle, even if it could be observed in the absence of measurement error. Second, while atmospheric behaviour does have some periodic components (e.g. diurnal and annual fluctuations), it has a strong aperiodic component, notably the movement of cyclones and anti-cyclones across middle-latitude continents and oceans. “An aperiodic system is inherently unstable”, Mason tells us, “so that the imposition of a random disturbance will render it chaotic (i.e., unpredictable) in the long run”.
5.1.10. Today, the Met Office (2017) is still able to report on “Continually Improving Our Forecasts”. No longer is the 15-layer global model of Sir John Mason’s days in use – but the same kinds of error statistics and index of forecasting skill surely are. Specifically, progress has been achieved through “[i]nvesting in technology, scientific expertise and verification”. Among these, technology has amounted to an IBM supercomputer, upgraded in 2012 and capable of 1,200 trillion calculations per second, and a 70-layer model at the global scale (along with similar 70-layer models of progressively more finely resolved spatial detail for Europe and the United Kingdom). Investment in these models and in the function of verification, to which a group of analysts is entirely dedicated, has enabled the Met Office (as it reports) to outperform five of the major operational NWP centres. Over the same period (August 2009 to February 2013) the Met Office’s NWP (verification) Index rose from just over 117 to just over 123 (the target set for March 2013), almost without faltering (Met Office, 2011).
5.1.11. Across three and more decades, then, impressive progress has been made in respect of the statistic of forecasting skill and the accuracy of the models used for NWP.
Targets for forecasting capacity are set by the Met Office; they are to achieve a specified value for an NWP Index by a specified date. And progress towards (or away from) this target is tracked publicly: with transparency, that is, and for all to witness.
5.1.12. Monitoring progress: institutional arrangements: In 2000, the World Meteorological Organisation (WMO) produced its Technical Document TD 1023 “Guidelines on Performance Assessment of Public Weather Services”. The web page introducing this document succinctly (and significantly) shifts emphasis away from the statistics of forecasts and the verification of models towards user satisfaction with the model’s forecasts. It states:
The aim of the evaluation is twofold: firstly, to ensure that products such as warnings and forecasts are accurate and skilful from a technical point of view and secondly, that they meet user requirements, and that users have a positive perception of, and are satisfied with the products (WMO, 2017).
5.1.13. Technical Document 1023 pushes the point further home:
Forecast accuracy is irrelevant if the forecast products are not available to the public at a time and in a form that is useful.
An assessment programme can be seen in the context of a quality system, where it is important to ensure that the information gathered and processed is focussed on user requirements, to be used in making decisions and taking actions to improve performance, rather than just being gathered for the sake of it (WMO, 2000, page 1).
5.1.14. Of course, this is not to say that verification is unimportant. As the web page states:
The main goal of a verification process is to constantly improve the quality (skill and accuracy) of the services. This includes:
∙ Establishment of a skill and accuracy reference against which subsequent changes in forecast procedures or the introduction of a new technology can be measured;
∙ Identification of the specific strengths and weaknesses in a forecaster’s skills and the need for forecaster training and similar identification of a model’s particular skills and the need for model improvement; and
∙ Information to the management about a forecast programme’s past and current level of skill to plan future improvements; information can be used in making decisions concerning the organisational structure, modernisation and restructuring of the National Meteorological service.
5.1.15. In this we can see the virtue of consistency of model and forecast assessment (not just transparency).
5.1.16. Nevertheless, out of a total of 32 pages of text in the main body of WMO TD 1023, 15 are devoted to “User-based Assessment”. Their content covers variously: surveys of the user community (almost the entire appended material is an exemplar of such a survey); focus groups; public opinion monitoring; feedback and response mechanisms; consultations through users’ meetings and workshops; and the collection of what are referred to as “anecdotal data”. We should be left in no doubt about the emphasis national weather forecasting services are urged (by the WMO) to place on user-based assessment, vis-à-vis verification.
5.1.17. Admiring what we cannot have: There are things that transfer readily across the disciplines and sectors, from atmospheric physics and weather forecasting to economics and the insurance industry, and there are things that do not.
5.1.18. On the positive side of the ledger, economic–financial forecasting error statistics can be reported just as they are for weather forecasting. For instance, Sir Terence Burns’ contribution to the 1986 Symposium on Predictability mirrors that of Sir John Mason. Error statistics are plotted for 1-year- and 2-year-ahead forecasts of GDP and the retail price index, for the years 1963–1985 and 1971–1985, respectively (Burns, 1986). Salient, however, is the absence of corresponding statistics for (economic) forecasting skill, which, as quite apparent from the foregoing, is distinctively central in weather forecasting.
At the time, Burns was working for HM Treasury.
5.1.19. On the negative side of the ledger, and as the Preface to the 1986 Predictability Symposium observes (Mason et al., 1986), there is this:
The weather forecast does not affect the weather, but the economic forecast may well affect the economy!
5.1.20. Adding to this obvious (and profound) difference, if not elaborating it expressly, Sen opened the Symposium with his paper on “Prediction and Economic Theory”. In this he reasoned that the origins of why economic predictions are so difficult lay then (as doubtless still now) in the complexity of what he called “the choice problem” and “the interaction problem”:
One source of this complexity [in how economic influences operate] lies in the difficulty in anticipating human behaviour, which can be influenced by a tremendously varied collection of social, political, psychological, biological and other factors. Another source is the inherent difficulty in anticipating the results of interactions of millions of human beings with different values, objectives, motivations, expectations, endowments, rights, means and circumstances, dealing with each other in a wide variety of institutional settings (Sen, 1986, pages 4–5).
5.1.21. The choices resulting from human behaviour may well subsume the processing of forecasts of future system behaviour deriving from a computational model – something we have referred to in the discussions of our Working Party as the problem of “endogeneity”. But Sen (1986) makes little reference to the quantitative side of economic forecasting.
5.1.22. Nearly three decades later, Greenspan (2013) certainly does. Indeed, his book bears the title The Map and the Territory, qualified (significantly) by the subtitle Risk, Human Nature, and the Future of Forecasting. The book is replete with tables and time-series of economic and financial statistics; regression analysis is prominent. What Greenspan has to say of the future of (economic) forecasting deserves to be reported in some detail. In doing so, we seek to redress the rather negative balance in our comparison (from 1986) of the gulf between weather forecasting and economic forecasting.
5.1.23. To begin, Greenspan reaches back to a time well before 1986. He wants to anchor what he refers to as the “propensities” of human nature in what Keynes called “animal spirits”:
My enquiry begins with an examination of “animal spirits”, the term John Maynard Keynes famously coined to refer to “a spontaneous urge to action rather than inaction, and not as the [rational] outcome of a weighted average of quantitative benefits multiplied by quantitative probabilities”. Keynes was talking about the spirit that impels economic activity, but we now amend his notion of animal spirits to its obverse, fear-driven risk aversion (Greenspan, 2013, page 8).
5.1.24. Greenspan proceeds to define a dozen and more of his human propensities, ranging from fear and euphoria over herd behaviour to time preferences, home bias and family dependency. He does so because what Haldane dubbed the “’Michael Fish’ moment” for economic forecasting – the Great Financial Crisis of 2008–2009 – was something of an epiphany for Greenspan:
[For] now, after the past several years of closely studying the manifestations of animal spirits during times of severe crisis, I have come to the view that there is something more systematic about the way people behave irrationally, especially during periods of extreme economic stress, than I had previously contemplated. In other words, this behavior can be measured and made an integral part of the economic forecasting process and the formation of economic policy. [Emphasis added]
In a change of my perspective, I have recently come to appreciate that “spirits” do in fact display “consistencies” that can importantly enhance our ability to identify emerging asset price bubbles in equities, commodities, and exchange rates — and even to anticipate the economic consequences of their ultimate collapse and recovery (2013, page 9).
5.1.25. And so it is that in the closing chapter (“The Bottom Line”), we find Greenspan’s manifesto for the future of (economic) forecasting, summarised by this sequence of quotes. First, on page 291:
When I was first contemplating the substance of this book, I was fully aware that a basic assumption of classical and neoclassical economics — that people behave in their rational long-term self-interest — was not wholly accurate. Moreover, the crisis of 2008 had impelled me to reassess my earlier conclusion that our animal spirits were essentially random and hence impervious to economic modeling. I was amazed, however, during the early months of this venture at just how many supposedly random variables were explained by statistically highly significant regression equations. Many, if not most, economic choices, the data show, are demonstrably stable over the long run for as far back as I can measure (Greenspan, 2013).
5.1.26. Second, on page 292:
Producing a fully detailed model is beyond the scope of this book.
These models [those of the future] should embody equations that, when possible, measure and forecast systematic human behavior and corporate culture (Greenspan, 2013).
5.1.27. Then, third, on page 293:
But we are far removed from the halcyon days of the 1960s, when there was great optimism that econometric models offered new capabilities to accurately judge the future.
This journey of analysis has finally come to rest in a place I could never have contemplated when I first began to recalibrate my economic views in the light of what the crisis of 2008 was telling us about ourselves (Greenspan, 2013).
5.1.28. Thus, to conclude, on page 299:
[W]e are driven by a whole array of propensities — most prominent, fear, euphoria, and herd behavior [at most, three of the thirteen] — but, ultimately, our intuitions are broadly subject to reasoned confirmation (Greenspan, 2013).
5.1.29. Considerations for actuarial models: What, then, are the lessons to be learned from this look over our professional, disciplinary boundaries across to weather forecasting? What does all this mean – the weather forecasting of today and 1986 and economic forecasting of 1986 and today – for practical progress in communicating and managing model risk in the insurance industry?
5.1.30. Significantly, we (as actuarial professionals) cannot enjoy the detachment of the mechanics of future weather from today’s model-generated forecast of it. Neither may we cling to the aspiration that (one day) the truth of the matter will be revealed in some gargantuan set of differential equations and an unbelievably all-encompassing, finely granulated, real-time observing system for generating (objective) facts and data – about all those human intentions and interactions to which Sen (1986) refers.
5.1.31. Yet, there might be scope for reporting (somewhere) the statistics of our forecasting skill, with consistency, so that progress (or not) may be tracked over the years and decades, and with transparency – for those who have a “right” to see into the black boxes of modelling in the insurance industry. True, the user audiences and constituencies served by national weather forecasting institutions may be very different from those served within and by insurance businesses large and small.
5.1.32. Nevertheless, a leaf or two might be taken out of the WMO’s Technical Document TD 1023 “Guidelines on Performance Assessment of Public Weather Services”. We have much in sympathy with its focus on user satisfaction and users’ positive perceptions of models and model-generated forecasts. After all, given Andy Haldane’s jibe, and as observed by several members of our Working Party, modellers, models and their forecasts – dreaded experts with their dreaded expertise, no less – are not held in high public esteem at present (see also Williams, 2017). At the very least, this case study (in particular, the WMO Technical Document) re-emphasises what the insurance industry and profession already seek to achieve with their Continuing Professional Development activities and their Technical Actuarial Standard (TAS) protocols.
5.1.33. The skill of our models and the skills of our modellers are ever “works in progress”; and as such they are in need of active continual improvement. That much we can take from our admiration of the practice of weather forecasting. But are we questioning whether we have the right skills for our sector, that is, ones that motivate improvement, as opposed to enabling more boxes to be ticked with ever greater routine efficiency?
5.1.34. The key is that there are some “positives” involved in the use of models in the insurance industry, not just the perceived “negatives” of yet more procedures to be followed for the purposes of complying with regulations. How exactly our profession might go about this in a sincere and genuine manner may be a sizeable challenge. We have no wish to be accused of yet more “spin” and obfuscation with what the public already looks down on as the “black boxes” of our models.
5.1.35. In the short term – building upon the use test of Solvency II, for instance, and taking the pragmatic business-person’s perspective – we might seek to lessen and dilute out the presently overly strong association of models with the “burdens”, “obligations” and the “worries” of capital allocations. Imagine, for example, a firm’s model users (as opposed to the model developers) parameterising its models directly, something which would not be possible for the consumers of weather-forecasting products. Indeed, given Greenspan’s reported success in encapsulating his “human propensities” in the statistical forms of fat-tailed distributions and regression relationships, the nature of the model and the language surrounding its discussion and parameterisation might thus be a step closer to the familiar, colloquial terms of everyday business (as opposed to the abstractions of computer software). We might even suggest there could be a certain user-friendly “greying” or “colouring” of the model in this. Furthermore, accounting better for these human propensities lies at the root of reversing the low esteem in which economic forecasting is held, relative to weather forecasting.
5.1.36. In the longer term, there may be scope for developing ways of designing and using models to address the challenges of “group-think” forming and then crystallising out in the making of insurance business decisions. The HM Treasury Report of 2013 (HM Treasury, 2013) was well aware of the difficulties associated with group-think in respect of the use of models in support of government decision-making, as already discussed in our Sessional Paper from Phase 1 (Aggarwal et al., 2016, pages 291–292, in particular). Group-think suggests a firm is, as it were, “touching just one base” in its deliberations prior to coming to the actionable decision. The firm is using just a single rationality: a single view of how markets work, with a set of business aspirations and risk preferences for the future similarly aligned with just this single mental model of the way the world works. In particular, in the context of Figure 2 (see sections 1.1.16–1.1.17) group-think would correspond to parameterising the computational model according to just one of the four cultures of model users: that of solely the “Confident model users”, or solely that of the “Conscientious modellers”, or the “Uncertainty avoiders”, or the “Intuitive decision makers”. In other words, this is the situation in which just the one predominant view in the firm is aired before the decision is made (and probably alternative views may not be heeded, nor even canvassed). There are precedents for how a plurality of views and aspirations might be explored computationally, that is, the means to “touch all four bases” before settling upon the decision. Oddly enough, these precedents can be found in the differential-equation-dominated worlds of climate change (van Asselt & Rotmans, 1996) and environmental protection (Beck, 1991, 2014). There are distinct echoes in them of the Reverse Sensitivity Testing touched on above (in sections 4.6.13–4.6.16). But as we say, technically facilitating this line of enquiry, for then its implementation in practice, might be something for the more distant future.
5.2.1. The aerospace industry is a sophisticated user of modelling techniques. Two well-known areas include: (i) computational fluid dynamics (CFD) modelling of aerodynamic responses in support of airframe design; and (ii) and the use of automated flight controls systems that underpins both “fly by wire” assistance for human controlled flight and autopilot functionality.
5.2.2. When things go wrong, detailed investigations into the cause of any incidents are carried out by independent investigators and learning points for design, manufacture or operation are published. The learning points often become regulatory imperatives.
5.2.3. Key points for CFD: The reason for using these techniques is to make the overall design-time processes more efficient and to reduce the time to market. This does not remove the need for testing in wind tunnels and flight-testing in the latter phases of development because ultimately the physical aircraft is the product that must fly in the real world, not the model in a simulation.
5.2.4. Whilst CFD modelling can make the overall process more efficient it comes with its own costs of supplying modelling expertise and the need for considerable computer power to provide the accuracy required.
5.2.5. Amongst the significant modelling challenge are the need to divide the three dimensional modelling space using a practical-sized grid, ensuring that the individual “cells” of the grid communicate adequately between each other and the modelling of discontinuities of the physical world.
5.2.6. Once the CFD modelling phases are complete then the testing moves into real-world validation with a series of physical models. Differences between the modelled result and the physical results are a driver for change as the physical development continues. The differences between modelled and physical results may also reveal potential enhancements to the modelling tools.
5.2.7. Aerospace components are designed and tested within a complex “envelope” covering multiple parameters such as weight, altitude, velocity, attitude, banking angles and so on. To ensure the integrity of the individual components and the safety of the overall aircraft, operation outside of the accepted “envelope” is not permitted.
5.2.8. Considerations for actuarial models: Although actuarial models do not have a physical representation there will be opportunities to compare the results of an actuarial model with the real world that is it intended to represent. This should form part of a model review process that regulations or good practice require.
5.2.9. The independence of CFD modelling and wind tunnel tests is self-evident. The latter will form a key sense-check on computed modelling errors arising from flaws in the coding or execution. In the actuarial modelling world there may only be a single model and its software implementation. Where the model is very complex and perhaps contains counter-intuitive results in some circumstances, there may be benefits in constructing an independent model that can be used to validate key features. “Back of the envelope checks” are much harder to compute with a calculator in the 2010s.
5.2.10. Modern financial instruments, consumer products, demographics and customer, and management actions may all contain discontinuous distributions and non-linear responses. Actuarial model design should identify these features and assess their potential to create material discontinuities in the model results that are used for decision-making. The selection of modelling granularity is likely to have greater relevance in one or more of the modelling dimensions where discontinuities exist.
5.2.11. The laws of physics do not change, but the markets and demographics that actuarial models are created to represent do. Actuarial models may benefit from having an “operating envelope” defined for them that may reduce the risk of a model being used in inappropriate or untested environments where the results are not yet proven to be correct.
5.2.12. Key points for flight control systems: Flight control systems have been created to reduce workloads for flight crew and the current generations of systems are now capable of carrying out nearly all of the phases of flight without human intervention.
5.2.13. These applications prevent the flight crew from attempting to push individual settings or perhaps the performance of the aircraft in one particular area outside the operating envelope. There have been cases where the envelope, which is a complex set of inter-related factors, was incorrectly implemented in software and was a contributing factor in the loss of life.
5.2.14. In some military applications the flight control system goes a step further as the intentional instability of such aircraft, designed this way to provide additional manoeuvrability, means that flight computers must be used as a human could not normally fly the aircraft without their assistance.
5.2.15. Critical systems areas may be engineered with multiple levels of redundancy to reduce the risks arising from single points of failure. The redundancy may involve physical components such as sensors and actuators and also multiple software routines. These can also be combined to take a majority “vote” on actions to be taken in the event of conflicting or missing signals and “fail safe” designs which reduce the risk of a wider problem arising when some components or processes fail.
5.2.16. Considerations for actuarial models: The processes wrapped around many actuarial models have already required significant investment to automate and streamline, to meet shorter reporting cycles and enable operating efficiencies. There is little new to be found in considering automation per se.
5.2.17. A more interesting area to explore is whether actuarial models and their processes have clearly defined operating envelopes to ensure that they are not used beyond the boundaries of their design.
5.2.18. Actuarial models will often be run within organisations that have business continuity plans that provide for redundancy in office locations or computer systems. At a more localised and granular level there may be some benefit in exploring how a model would be run in the absence of one or more areas of input data. For example, if there was a significant change in market values and a model needed to be re-run, but a set of scenarios required as input to the model was not available, consideration could be given to how an approximation to the inputs could be created or how previous model results could be reused to allow for the new conditions.
5.2.19. Key points for incident investigation: The purpose is to learn lessons for the future and reduce the risk of loss of life, injury and also the consequent financial impacts. Independent investigators examine, with the widest of remits, any and all factors that may have contributed to the incident. Investigations may also be carried out into near misses and other events that exhibit the potential to have caused a more significant incident.
5.2.20. Areas of investigation cover design, manufacture, maintenance, operation, security, procedural and human factors, and in many cases an incident will be found to have been caused by multiple factors, often from different areas.
5.2.21. A key foundation of all investigations is the data retrieved from the flight data recorder and the voice recorder – the so-called black boxes. Consequently the performance and resilience of these recorders are standardised and mandatory, depending on the category of aircraft and its type of operation.
5.2.22. The human factors involved include the relationship between the flight crew and how this may have impacted on their performance. Factors that may be examined include experience, corporate seniority, procedures and training.
5.2.23. Considerations for actuarial models: Problems with actuarial models have consequences that are on a lower scale than those in the aerospace industry, but they may still have high financial and social costs.
5.2.24. Individuals with the relevant professional skills and independence should therefore carry out investigation into significant model failures or underperformance.
5.2.25. Investigation into model performance should not, however, be limited to failure, but be built into the normal operating processes. In many areas of actuarial modelling this review process is built into the regulatory framework.
5.2.26. Human factors are an important area for users of actuarial models, where user here should be taken to mean everyone from the model operators, through management to the ultimate decision-makers who rely on the outputs. (It is an oversimplifying generalisation to say that the builders and runners of models need to improve their communication skills and that the ultimate decision-makers need to improve their understanding of the construction and the limitations of a model … but when financial models go wrong, those factors are often present.)
5.3. Software Development: Design and Testing
5.3.1. The development of models and software are closely interlinked. For the purpose of this section a distinction is to be made between the “conceptual model” and the “software implementation” of that model. In theory, if not in practice, results from a “conceptual model” could be calculated using a pad of paper, a pencil and a calculator.
5.3.2. From the 1980s, the personal computer revolution placed ever-increasing computer power in the hands of actuaries, enabling ever more sophisticated models to be implemented. Actuarial software implementations use a combination of specialist actuarial tools, general-purpose databases and spreadsheet systems and bespoke code. In all of their software design, build, test and deployment activities, actuaries have had access to the expertise of information technology (IT) professionals and to the evolving tools and techniques of that profession.
5.3.3. As a relatively new industry with ever more diverse application and continued rapid growth, software development methodologies have also continued to evolve and adapt. Over the past decade, the use of “Test-Driven Design” (TTD) and “Behaviour-Driven Design” (BDD) methodologies have been widely adopted to support faster development cycles. These methodologies are often deployed with “Continuous Integration” – a technique where incremental changes are made to software on a frequent basis (often daily) and an evolving development version of a software system is always being run and tested.
5.3.4. Whilst no one single style or methodology of system development can be said to be best suited to the development of actuarial conceptual models and their software implementations, these newer techniques bring from the IT industry some vocabulary, methodology and standardisation that may be of use. In addition, these methods formalise and support some of the styles of rapid application development that many actuarial professionals have used for the past 20+ years.
5.3.5. Key points for TTD/BDD: The essence of these techniques is that the tests for the new software are defined and created up-front before the new code is written. Usually the tests themselves will be part of a testing framework that executes the new software as it is created.
5.3.6. The new software is then incrementally developed to meet the requirements of the tests. In general the “TDD” name is applied when dealing with relatively small pieces of code, whilst “BDD” applies to a system or a subsystem.
5.3.7. Benefits arising from TDD/BDD include clearer documentation of what the software has been designed to do and whether, according to the test status, it is capable of doing it as required.
5.3.8. A corollary of the scope of the TDD/BDD test suite is that the software may not be considered as capable of performing a function or dealing with a situation for which there is no explicit test.
5.3.9. Considerations for actuarial models: TDD/BDD methodologies are useful techniques to consider using to develop both the conceptual models and their software implementation.
5.3.10. Whether or not such a methodology is used for developing a specific software implementation, the underlying thinking should be an important check for the use of a conceptual model. It is important that a model, or the software, is only used in an environment and with inputs that have been tested for and for which it is known to perform as required. (This is the same point as the aerospace operating envelope.)
5.3.11. The creation of sensitivity tests and scenarios for actuarial models is a closely related practice.
5.3.12. Software development: meta data: As noted in section 4.6.3, meta data is information or data that describes other data. Meta data can be somewhat mundane, such as the count of rows and columns in a table, but even this can usefully be the foundation of important tests and controls that will be very familiar and commonplace to actuaries using many types of software.
5.3.13. The increasingly diverse sources of data being collected, transformed and stored by applications in general has increased the attention given to the meta data that is generated by software systems as they carry out their primary tasks. Although the lines of what constitutes meta data versus what constitutes primary data and results may be blurred, it is not necessary to draw a firm distinction between them where the information encapsulated by the meta data is useful.
5.3.14. Considerations for actuarial models: Meta data has for a long time been an important resource for controls of actuarial models, assumptions and results. As new and more complex models are developed it may be helpful to consider areas where meta data may provide additional insights into why the model has performed in the way that it has.
5.3.15. Meta data may also be designed to provide a more efficient way of analysing and comparing results between different runs of a model. For example, when seeking to evaluate the impact of a basis change on a calculation of liabilities, it may be useful to arrange that the outputs of the model provide supporting intermediate data. In this way, the impact of a change to expense assumptions for a sensitivity analysis might be shown as being isolated to the expense meta data with other meta data (e.g. number of policies, premiums, claims and investment returns) being unchanged between the two runs. If these other items were to change in response to a basis that has updated expenses, this could be an indication of a problem with the setup or the execution. (And in a more sophisticated model, where policyholder or management actions in the model are a function of expenses, further layers of meta data could be designed to provide additional insights into the way that these actions have been triggered when expenses differ.)