Data Scientists vs Data Analysts
November 5th, 2019
The last two decades have seen dramatic advances in automation, from affordable smartphones that can understand your voice commands, to self-driving cars with safety records comparable to human drivers, and computers that can diagnose disease as well as experienced doctors.
These advances have been driven not just by falling costs of computing power, but huge leaps forward in machine learning – techniques which automate the discovery of patterns and associations in data. The most powerful of these require minimal human expertise to guide that learning. In many cases, this means computers can discover the underlying rules and patterns in data by themselves.
Whilst the terminology has exciting connotations in science fiction, artificial intelligence, or AI, is the use of these techniques to perform tasks that we previously thought could only be done by a human – driving a car, playing chess, or recommending medication, for example.
The power of modern machine learning and AI techniques, combined with the easy availability of tools and platforms for running them, has led to a Cambrian explosion of research, startups, products and services.
A UK government report suggests AI will grow GDP by 10% in ten years.
This is exciting, and fantastic for a high-skills future digital economy.
In the enthusiasm to use new techniques, we sometimes leave behind the more sober challenge of understanding if we’re implementing them safely.
There are huge societal questions about the impact of AI. To what extent is pervasive AI removing people, and perhaps common sense flexibility, from our daily interactions? Are social media and news platforms using algorithms that prioritise dubious content to drive clicks over responsible content, and what’s the effect of this on democracy?
Even if we can’t solve these entirely on our own, there are valid questions at the scale of our own organisations.
How do we know our exciting new AI-driven service will work for all of our users? What are the limits of its performance and accuracy? What happens when it fails? Will it fail gracefully or catastrophically?
These are basic bread and butter assurance questions which should be applied to any product or service.
But the additional complexity of machine learning and AI brings new challenges and risks, and the need to ask new questions.
A common failure is unintentionally teaching your machine learning system to be biased against some of your users, because the data used to train it was itself biased. Is your facial recognition system being trained on a sufficiently diverse set of faces? If not, will it impact negatively and unfairly on people from black and minority ethnic backgrounds?
Because machine learning and AI techniques do use advanced mathematical methods, understanding how they work, and ensuring they are used safely, does require specific expertise. Just because the tools are accessible and seem easy to use, doesn’t mean everyone can use them safely.
Your customers and users want to trust your business and your AI-driven products and services. This trust has to be earned, particularly in the context of recent news stories around data breaches, unethical collection and mining of data, and automated decisions biasing unfairly against users.
Your assurance framework must examine not just the use of technology and data, but also your processes and governance.
Are your testing processes sufficiently scoped? Is your machine learning development auditable? Are your tests repeatable? Could you stand up in court and credibly explain how your processes were sufficiently robust even if they led to someone being hurt?
These are standard questions for any automated service. But any AI assurance framework must also take into account the unique complexity of machine learning.
For example, many machine learning and AI techniques inherently rely on mathematical randomness. This can mean your AI-driven products and services could behave unexpectedly. Will you know when this happens before your users do? What steps will be triggered when this happens?
Another example is data mining techniques that can reveal more about users than they thought they had shared. Typically, this requires different data sets to be combined, but advanced techniques can reveal identifying patterns in single data sets too, even in apparently anonymised data. You need to consider how your data processing might affect people’s privacy.
Machine learning and AI brings together responsibilities around data security, privacy, ethics, legal and operations. This presents a challenge to the traditional roles of CTO, CIO and head of security, where individual accountability has traditionally only covered some of these domains.
Your governance of AI shouldn’t be a copy and paste from a management handbook, but carefully designed to meet the needs of your users and match the nature of your product and services. It should be proportionate to the potential impact on users if things go wrong, and ideally evolve, not disrupt, the existing governance in your organisation.
No matter how your organisation works internally, customers and users don’t want to see diluted or confused accountability. They want a publicly visible person who has clear accountability and well-defined responsibility.
Many organisations will be on a journey towards maturity in their safe and effective use of AI. The following is a typical description of this journey.
Level 0 “Immature” – exploring machine learning tools, little understanding of data used for machine learning, prematurely use in live products, no understanding of the limits of their AI, no clear accountability for the impact on users.
Level 1 “Starting” – data relevant to machine learning, testing that provides some confidence, some ownership of data security and technology management by existing governance.
Level 2 “Competence” – carefully selected tools, deeper understanding of data, testing of AI accuracy, bias, and the performance limits of AI, an action plan for when automation fails, focussed accountability and clear roles and responsibilities.
Level 3 “Open and Trusted” – monitoring of emerging tools and new methods in machine learning, public transparency into data use, continuous development and testing processes, openness about limits of AI, visible plans for AI failover, and publicly accessible accountable senior leader.
To truly earn and develop trust, you need to have enough confidence in your technology, processes and people to be open about them, showing them working, particularly when things go wrong.
Both the UK government and Europe have responded to growing questions around the ethics and safety of AI with consultations and initial proposals for assessment frameworks.
It is expected that as the use of machine learning and AI matures, regulation and assurance standards will strengthen, particularly for industry sectors where the impact of badly managed AI is significant.
It is important not to let the mythology of machine learning and AI prevent you asking the basic questions you would ask of any organisation’s tools, processes, data and governance.
There are very good assurance frameworks for AI safety and ethics being developed by the UK’s Turing Institute and the EU, which cover common themes – human agency, technical robustness, privacy, transparency, fairness, accountability, and societal impact.
Designing your products and services around users and their needs is now established as a good philosophy. Designing your development, testing and governance around users, and their need for privacy, safety, fairness and trust is also a good philosophy for AI driven products and services, no matter which assurance framework you choose.
Tariq Rashid is the founder of Digital Dynamics. He has over 20 years’ experience in technology, security, and digital transformation, and is the author of a successful text on machine learning. Today, he helps organisations take safe and ethical advantage of the opportunities of AI and machine learning. www.digital-dynamics.io