Top 3 things data science can teach actuaries

Actuaries and data scientists are similar in many ways. In a sense, actuaries were the original data scientists within the insurance industry. But in the last couple of decades, data science has accelerated and leap-frogged the actuarial profession in a number of technical areas.

There is a lot that actuaries can learn from data scientists today. And yet there is also a lot that data scientists can learn from actuaries to work more effectively in insurers and with pension funds.

This is the first of two blog posts – in this we give our top 3 lessons that actuaries should take from data scientists.

1. Validation, validation, validation

In the last two decades, data science has developed at a rapid pace. So too have the tools and techniques. To our mind, the tools that data scientists have for validating their models, are now far superior to the traditional actuarial tools.

Validation is particularly important when creating models or running analysis on large datasets. Overfitting is the enemy – this is where the models fit well to the data we have and therefore look good, but actually do a poor job of predicting the future.

The data science community has built a very robust set of methods and practices for validation of machine learning models.  Cross-validation and holdout validation are techniques that every data scientist learns from the start and applies to their work.

There is a lot that technical actuaries can learn from data science validation techniques. The internet makes it easy to plug into the latest developments in data science.

2. Black boxes – blessing or curse?

Ensemble machine learning models and neural nets are vastly more complex than the models most actuaries have traditionally used.The fields of image recognition, audio processing, and language translation have propelled forward the development of these complex algorithms, and we can now use them effectively for other tasks that actuaries may have traditionally done.

But these algorithms are often ‘black boxes’ and treated with scepticism by actuaries and auditors. And this has been for good reason, because it can be very difficult to understand what is really going on inside the algorithm. There is a valid concern that the model may be biased, or non-compliant, or have hidden ‘edge cases’ where the model does something strange or unpredictable.

We know today that these complex models often do much better than the old familiar models. So actuaries and their employers really need to find a way to utilise them. But how do we get comfortable with the black box?

New tools are emerging from the data science world (two examples we like – LIME and Shapley Values) which make a great contribution to opening up the black boxes. This holds a lot of promise and is an ongoing development that technical actuaries should be supporting, and in some cases, driving.

3. Democratising tools and knowledge

Almost any smart individual with a computer can start to do some data science today. The knowledge is democratised by the internet. Most algorithms are open-source. Major organisations such as Google and Facebook continue to support the open-source community and make their own algorithms public. It is a very different world to the traditional closed model. Keras and pyTorch are just two examples of many open-source data science projects thriving today.

In contrast, the actuarial community and modelling tools remain largely proprietary, closed and fragmented.

Where open-source works, it creates a vastly larger global community of users, developers and maintainers that drives forward development and thinking. But equally, sub-standard open-source projects rapidly whither and die.

If insurers and the actuarial profession can engage a wider community to propel forward the development of data science tools combined with actuarial skills, this could be a significant benefit to the industry and society.  How might this be done?

One area is in working out where we might open up our toolset and democratise that knowledge. Another is in encouraging technical actuaries to get more actively involved in the existing data science open-source communities.

Next post – top 3 things data scientists can learn from actuaries.