Abstract
The article contains guidance on how to assess the risk connected to the usage of Big data and Artificial intelligence in the decision process, and how mitigate it.
The technical bases for this work are the solid methodology to audit AI, introduce by ICO within is paper on ‘Draft AI auditing framework’, published on the 1 of May 2020.
Nonetheless, the use of large amounts of data to make predictions or classifications about individuals has existed in sectors like insurance since the 19th century, long before the term ‘AI’ was coined in the 1950.
In conclusion the article illustrates the best practices for leading AI implementation projects.
1. The definition
2. Distinctive aspect and risks
3. How assess big data
4. Best Practice
5. What differences deploying an AI system
1. Definition
The terms ‘big data’, ‘AI’, and ‘machine learning’, are often mistaken within different concepts, but probability, computational analysis, and the other mathematical concepts that support the automation process are well established, since the 19th century.
However, is fundamental starting with the definition of those new concepts:
Most of the time dealing within a new technology implementation, e.g. screening transaction, evaluating the result of an international research, analysing brain activities to predict its behaviour, means dealing within all those three elements.
2. Distinctive aspect and risks
Big Data, AI, and machine learning are used to deliver big data analytics, but they have implications for data protection.
In particular concerns are in:
The process involves a discovery phase of large number of data to find correlations that often has an unpredictable out-comes by design.
More specific, a first screen is conducted on all files available and determine what rules has been applied to find the first correlation.
This means that only after the correlations have been identified, a new algorithm would be applied in the so call ‘application phase’.
In simple words the machine is learning which are the relevant criteria, during the same process of analysing it, moving an uncontrolled amount of data.
Also known as deep learning, involve a vast amount of data through a non linear neural network, creating a black box effect, where the outcome is unpredictable and it is impossible analysing the reasons for the decisions made as a result of the process,
e.g. google Alpha Go or other A.I. battles: memorable what the AI researchers at Google Brain, have conducted with three entities powered by deep neural networks: Alice, Bob, and Eve.
Traditionally an element of control on those processes has been the criteria used to screen the data, but now the common function ‘n=all’ has erased this step of control.
The power available, allowing an overall usage of the data available, made the process indistinguishable and raised the risk awareness.
it happens when the process uses the same set of data for other purposes, or the same basket of information is used by a different organisation that cannot be traced by the user.
For Example, the National Statistics (O.N.S.) has experimented with using Geo located Twitter data to infer people’s residence and mobility patterns, to supplement official population estimates. (Swear, Nigel; Kommunizma, Pence and Clapper ton, Ben. Using Geo located Twitter traces to infer residence and mobility. G.S.S. Methodology Series no. 41. O.N.S., October 2015).
To avoid a breach of legislation must be carry on multiple evaluations, and consider any Re-purposing data as a singularity
The implementation of services in our life and the Internet of things (I.O.T. technologies) has created a new spectrum of data collected by corporate that pose a risk to the security of personal data and deserve a proper amount of care.
However, new studies have shown that 61% of the enterprise has an adequate level of awareness, but few know that sensors in the street or shops can capture the unique MAC address of the mobile phones of passers-by.
The Accountability Foundation (Abrams, Martin. The origins of personal data and its implications for governance. O.E.C.D. March 2014. ) distinguishes four types of data:
If it is crucial to consider the implications of big data analytics for data protection, but not all big data collected are personal data.
3. How assess big data usage
Organisations must consider whether the use of personal data in big data applications is within people’s reasonable expectations and recognise that such complexity makes it difficult for them to be transparent about the processing of personal data.
A fundamental factor in assessing those data is considering the usage:
In this second event, they should consider such data as personal: e.g., the credit score given to a person to lower is credit limit.
Accordingly, with this view, article 4 of the G.D.P.R. consider sensible data:
“ Any form of automated processing of personal data consisting of using those data to evaluate certain personal aspects relating to a natural person, in particular, to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests reliability, behaviour, location or movements”.
This means that if big data organisations are using personal data, then as part of assessing fairness, they need to be aware of and factor in the effects of their processing on the individuals, communities, and social groups.
Privacy impact assessments provide a structured approach to doing this: the D.P.A. assessment.
4. Best Practices
In U.K., data protection is governed by the E.U. General Data Protection Regulation and the U.K. Data Protection Act 2018.
The D.P.A. 2018 supplements the E.U. G.D.P.R. rather than enacting it, so the two laws should be read together.
Organisations in the U.K. that process personal data must comply with these two laws or risk fines of up to €20 million or 4% of annual global turnover.
Processing personal data must meet the conditions listed alternatively in the D.P.A. or the G.D.P.R.
Under those legislation all companies that are processing a specific type of data with the result of posing a high risk to the right and freedom of individuals must assess those risks by conducting a D.P.I.A.
This applies equally to big data analytics that use personal data to make automated decision.
Consent is one condition for processing personal data, but it is not the only condition available, and it does not have any different status than the others.
In particular, a recent report by the European Union Agency for Network and Information Security (E.N.I.S.A.) has found that: “Practical implementation of consent in big data should go beyond the existing models and provide more automation, both in the collection and withdrawal of consent.
Software agents providing consent on user’s behalf based on the properties of certain applications could be a topic to explore.
Moreover, taking into account the sensors and smart devices in big data, other types of usable and practical user positive actions, which could constitute consent (e.g. gesture, spatial patterns, behavioural patterns, motions), need to be analysed.”
However, a company is using social media data to profile individuals, e.g., for recruitment purposes or for assessing insurance or credit risk.
It needs to ensure it has a data protection condition for processing the data: lawfulness, fairness, and transparency.
Section 4 G.D.P.R.‘ Right to object and automated individual decision-making’ allow any data subject to control the process.
Legitimate interest is another condition, illustrated in schedule 2 state 6 of the D.P.A., because the processing is necessary for the legitimate interests of the organization collecting the data (or others to whom it is made available).
This means a big data organization
Fairness is a crucial factor in determining whether big data analysis is incompatible with the original processing purpose.
If an organisation is buying personal data from elsewhere for big data analytics, it needs to practice due diligence.
Must assess whether the new processing is incompatible with the original purpose for which the data was collected, as well as checking whether it needs to seek further consent or provide a new privacy notice.
Organisations will need to have appropriate processes to deal with the G.D.P.R.’s extension of rights regarding decisions based on automated processing.
A further accountability provision under the G.D.P.R. is the requirement to appoint a data protection officer D.P.O.
This will be a necessity for organisations that systematically monitor individuals on a large scale, including those using big data analytics for purposes such as online behaviour tracking or profiling through:
In respect of the usage of those data by a third party, Sch 2, Part 1 Para 2 allows an organisation to disclose personal data to a third party in circumstances where that organisation would otherwise be prevented from doing so by the G.D.P.R., where:
In those cases, we are applying the S.P.A.R.C. Methodology, as suggested by the Insurance Fraud bureau, should be a best practice.
5. What differences deploying an AI system
AI systems use data in a different way, and some definitions acquired a different meaning. E.g., statistical accuracy is referencing the correct answer about the classification made by the algorithm governing the AI, instead, in a standard data flow is related to the accuracy of data the organisation process concerning an individual.
Is important to recognise that, to deploy an AI system, two fundamental and separate phases must be considered:
The first one is an internal process.
The second one has a direct impact on the public at large, only in the second case a companies are legally required to complete a D.P.I.A., if they are using an AI systems that process personal data.
D.P.I.A. offers the opportunity to consider how and why you are using AI systems to process personal data and potential risks.
Using AI to process personal data is likely to result in a high risk to individuals’ rights and freedoms, and therefore triggers the legal requirement to undertake a D.P.I.A..
If the result of an assessment indicates a high residual risk to individuals that you cannot sufficiently reduce, you must consult with the ICO before starting the processing.
Where necessary, you should consider how to capture additional factors for consideration by the human reviewers.
For example, they might interact directly with the person the decision is about, to gather such information.
Those in charge of designing the front-end interface of an AI system must understand the needs, thought processes, and behaviours of human reviewers and effectively intervene.
It may, therefore, be helpful to consult and test options with human reviewers early on.
Therefore, your risk management policies should establish a robust, risk-based, and independent approval process for each processing operation that uses AI.
Accountability requires the company to:
The assessment cannot be a delegate to data scientists or engineering teams.
The senior management, including Data
Protection Officers (D.P.O.), are accountable for understanding and addressing them appropriately and promptly.
The senior manager must:
As state, the D.P.I.A. process will help identify the relevant risks, but it is a good practice to assign a score or level to each risk, measured against the likelihood and the severity of the impact on individuals.
Against each identified risk, you should consider options to reduce the level of assessed risk further.
Examples of this could be data minimisation or providing opportunities for individuals to opt-out of the processing.
It is essential that D.P.O. and other information governance professionals are involved in the AI projects from the earliest stages. The management should record in the D.P.I.A. the measure chosen to reduce or eliminates the risk
Finally, D.P.I.A. can also evidence the organisational measures put in place, such as:
Any time the same infrastructure uses a defined set of data to deploy a different task, the senior manager must review the D.P.I.A. to evaluate whether different lawful bases are applied.
The same practice should be applied when a third party is implementing the project.
After the project has deployed, it is fundamental to implement monitoring activities to define:
Because AI is an automation process of decision, it should always be considered if the outcome is free or not by bias and discrimination. In case correct them as soon as possible.
In conclusion, deploying project-embedded AI technology has the same meaning regarding the applicable legislation, but do not present the same challenges.
The volume of data, the arbitrary of the decision, and the unpredictability of the outcomes are aspects that required particular attention, and they cannot be lived only of the hand of the technical person.
At this stage, the technology steel needs constant monitoring of its usage to ensure that all aspects remain in control of the senior manager responsible
May 2020
by Daniele Lupi