Data Privacy

Abstract

The article contains guidance on how to assess the risk connected to the usage of Big data and Artificial intelligence in the decision process, and how mitigate it.

The technical bases for this work are the solid methodology to audit AI, introduce by ICO within is paper on ‘Draft AI auditing framework’, published on the 1 of May 2020.

Nonetheless, the use of large amounts of data to make predictions or classifications about individuals has existed in sectors like insurance since the 19th century, long before the term ‘AI’ was coined in the 1950.

In conclusion the article illustrates the best practices for leading AI implementation projects.

1. The definition

2. Distinctive aspect and risks

3. How assess big data

4. Best Practice

5. What differences deploying an AI system

1. Definition

The terms ‘big data’, ‘AI’, and ‘machine learning’, are often mistaken within different concepts, but probability, computational analysis, and the other mathematical concepts that support the automation process are well established, since the 19th century.

However, is fundamental starting with the definition of those new concepts:

Big data is high-volume, high-velocity and/or high-variety information received from sensor immerse in our daily life, that demand cost-effective and innovative forms of information. Processing those enable enhanced insight, decision making, and process of automation.
AI is the analysis of data to model some aspect of the world. Inferences from these models are then used to predict and anticipate possible future events.
Machine Learning is the set of techniques and tools that allow computers to ‘think’ by creating mathematical algorithms based on accumulated data.

Most of the time dealing within a new technology implementation, e.g. screening transaction, evaluating the result of an international research, analysing brain activities to predict its behaviour, means dealing within all those three elements.

2. Distinctive aspect and risks

Big Data, AI, and machine learning are used to deliver big data analytics, but they have implications for data protection.

In particular concerns are in:

The use of algorithms
The opacity of the processing
The tendency to collect ‘all the data’
The re-purposing of data, and
The use of new types of data

The use of algorithms

The process involves a discovery phase of large number of data to find correlations that often has an unpredictable out-comes by design.

More specific, a first screen is conducted on all files available and determine what rules has been applied to find the first correlation.

This means that only after the correlations have been identified, a new algorithm would be applied in the so call ‘application phase’.

In simple words the machine is learning which are the relevant criteria, during the same process of analysing it, moving an uncontrolled amount of data.

The opacity of the processing

Also known as deep learning, involve a vast amount of data through a non linear neural network, creating a black box effect, where the outcome is unpredictable and it is impossible analysing the reasons for the decisions made as a result of the process,

e.g. google Alpha Go or other A.I. battles: memorable what the AI researchers at Google Brain, have conducted with three entities powered by deep neural networks: Alice, Bob, and Eve.

Using all the data

Traditionally an element of control on those processes has been the criteria used to screen the data, but now the common function ‘n=all’ has erased this step of control.

The power available, allowing an overall usage of the data available, made the process indistinguishable and raised the risk awareness.

Re-purposing data

it happens when the process uses the same set of data for other purposes, or the same basket of information is used by a different organisation that cannot be traced by the user.

For Example, the National Statistics (O.N.S.) has experimented with using Geo located Twitter data to infer people’s residence and mobility patterns, to supplement official population estimates. (Swear, Nigel; Kommunizma, Pence and Clapper ton, Ben. Using Geo located Twitter traces to infer residence and mobility. G.S.S. Methodology Series no. 41. O.N.S., October 2015).

To avoid a breach of legislation must be carry on multiple evaluations, and consider any Re-purposing data as a singularity

New types of data

The implementation of services in our life and the Internet of things (I.O.T. technologies) has created a new spectrum of data collected by corporate that pose a risk to the security of personal data and deserve a proper amount of care.

However, new studies have shown that 61% of the enterprise has an adequate level of awareness, but few know that sensors in the street or shops can capture the unique MAC address of the mobile phones of passers-by.

The Accountability Foundation (Abrams, Martin. The origins of personal data and its implications for governance. O.E.C.D. March 2014. ) distinguishes four types of data:

Provided data
- consciously given by individuals, e.g., when filing in an online form.
Observed data
- recorded automatically, e.g., online cookies or sensors or CCTV linked to facial recognition.
Derived data
- Other data produced in a relatively simple and straightforward fashion, e.g. calculating customer profitability from the number of visits to a store and items bought.
Inferred data
- produced by using a more complex method of analytics to find correlations between datasets and profile people as well as credit scores or health outcomes. Those are less ‘certain’ than derived data.

If it is crucial to consider the implications of big data analytics for data protection, but not all big data collected are personal data.

3. How assess big data usage

Organisations must consider whether the use of personal data in big data applications is within people’s reasonable expectations and recognise that such complexity makes it difficult for them to be transparent about the processing of personal data.

A fundamental factor in assessing those data is considering the usage:

Purely for research purposes
Make a decision affecting individuals

In this second event, they should consider such data as personal: e.g., the credit score given to a person to lower is credit limit.

Accordingly, with this view, article 4 of the G.D.P.R. consider sensible data:

“ Any form of automated processing of personal data consisting of using those data to evaluate certain personal aspects relating to a natural person, in particular, to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests reliability, behaviour, location or movements”.

This means that if big data organisations are using personal data, then as part of assessing fairness, they need to be aware of and factor in the effects of their processing on the individuals, communities, and social groups.

Privacy impact assessments provide a structured approach to doing this: the D.P.A. assessment.

4. Best Practices

In U.K., data protection is governed by the E.U. General Data Protection Regulation and the U.K. Data Protection Act 2018.

The D.P.A. 2018 supplements the E.U. G.D.P.R. rather than enacting it, so the two laws should be read together.

Organisations in the U.K. that process personal data must comply with these two laws or risk fines of up to €20 million or 4% of annual global turnover.

Processing personal data must meet the conditions listed alternatively in the D.P.A. or the G.D.P.R.

Under those legislation all companies that are processing a specific type of data with the result of posing a high risk to the right and freedom of individuals must assess those risks by conducting a D.P.I.A.

This applies equally to big data analytics that use personal data to make automated decision.

Consent is one condition for processing personal data, but it is not the only condition available, and it does not have any different status than the others.

In particular, a recent report by the European Union Agency for Network and Information Security (E.N.I.S.A.) has found that: “Practical implementation of consent in big data should go beyond the existing models and provide more automation, both in the collection and withdrawal of consent.

Software agents providing consent on user’s behalf based on the properties of certain applications could be a topic to explore.

Moreover, taking into account the sensors and smart devices in big data, other types of usable and practical user positive actions, which could constitute consent (e.g. gesture, spatial patterns, behavioural patterns, motions), need to be analysed.”

However, a company is using social media data to profile individuals, e.g., for recruitment purposes or for assessing insurance or credit risk.

It needs to ensure it has a data protection condition for processing the data: lawfulness, fairness, and transparency.

Section 4 G.D.P.R.‘ Right to object and automated individual decision-making’ allow any data subject to control the process.

Legitimate interest is another condition, illustrated in schedule 2 state 6 of the D.P.A., because the processing is necessary for the legitimate interests of the organization collecting the data (or others to whom it is made available).

This means a big data organization

must have a framework of values against which to test the proposed processing, and
a method of carrying out the assessment and keeping the processing under review
must be able to demonstrate it, in case of objections by the data subjects or investigations by the regulator.

Fairness is a crucial factor in determining whether big data analysis is incompatible with the original processing purpose.

If an organisation is buying personal data from elsewhere for big data analytics, it needs to practice due diligence.

Must assess whether the new processing is incompatible with the original purpose for which the data was collected, as well as checking whether it needs to seek further consent or provide a new privacy notice.

Organisations will need to have appropriate processes to deal with the G.D.P.R.’s extension of rights regarding decisions based on automated processing.

A further accountability provision under the G.D.P.R. is the requirement to appoint a data protection officer D.P.O.

This will be a necessity for organisations that systematically monitor individuals on a large scale, including those using big data analytics for purposes such as online behaviour tracking or profiling through:

Preventative controls
- Designed to stop errors or risks from happening
Detective controls
- Designed to find errors after they have occurred
Corrective Controls
- Designed to correct any errors found by the detective controls and mitigate the impact of the error.

In respect of the usage of those data by a third party, Sch 2, Part 1 Para 2 allows an organisation to disclose personal data to a third party in circumstances where that organisation would otherwise be prevented from doing so by the G.D.P.R., where:

the prevention or detection of a crime
the apprehension or prosecution of offenders
the assessment or collection of any tax or duty or any imposition of a similar nature, (the “Purposes”)
the application of the non-disclosure provisions could cause prejudice on any of the three purposes mentioned above.

In those cases, we are applying the S.P.A.R.C. Methodology, as suggested by the Insurance Fraud bureau, should be a best practice.

5. What differences deploying an AI system

AI systems use data in a different way, and some definitions acquired a different meaning. E.g., statistical accuracy is referencing the correct answer about the classification made by the algorithm governing the AI, instead, in a standard data flow is related to the accuracy of data the organisation process concerning an individual.

Is important to recognise that, to deploy an AI system, two fundamental and separate phases must be considered:

Training the AI system to develop a statistical model
Deploying a model to make predictions on real situations.

The first one is an internal process.

The second one has a direct impact on the public at large, only in the second case a companies are legally required to complete a D.P.I.A., if they are using an AI systems that process personal data.

D.P.I.A. offers the opportunity to consider how and why you are using AI systems to process personal data and potential risks.

Using AI to process personal data is likely to result in a high risk to individuals’ rights and freedoms, and therefore triggers the legal requirement to undertake a D.P.I.A..

If the result of an assessment indicates a high residual risk to individuals that you cannot sufficiently reduce, you must consult with the ICO before starting the processing.

Where necessary, you should consider how to capture additional factors for consideration by the human reviewers.

For example, they might interact directly with the person the decision is about, to gather such information.

Those in charge of designing the front-end interface of an AI system must understand the needs, thought processes, and behaviours of human reviewers and effectively intervene.

It may, therefore, be helpful to consult and test options with human reviewers early on.

Therefore, your risk management policies should establish a robust, risk-based, and independent approval process for each processing operation that uses AI.

Accountability requires the company to:

be responsible for the compliance of your system
assess and mitigate the risk
document and demonstrate how your system is compliant
justify the choices you have made.

The assessment cannot be a delegate to data scientists or engineering teams.

The senior management, including Data

Protection Officers (D.P.O.), are accountable for understanding and addressing them appropriately and promptly.

The senior manager must:

assess the risks to individuals’ rights that your use of AI poses
determine how you need to address these
establish the impact this has on your use of AI

As state, the D.P.I.A. process will help identify the relevant risks, but it is a good practice to assign a score or level to each risk, measured against the likelihood and the severity of the impact on individuals.

Against each identified risk, you should consider options to reduce the level of assessed risk further.

Examples of this could be data minimisation or providing opportunities for individuals to opt-out of the processing.

It is essential that D.P.O. and other information governance professionals are involved in the AI projects from the earliest stages. The management should record in the D.P.I.A. the measure chosen to reduce or eliminates the risk

Finally, D.P.I.A. can also evidence the organisational measures put in place, such as:

appropriate training, to mitigate risks associated with human error
document any technical measures designed to reduce risks to the security and accuracy of personal data processed in the AI system.
What additional steps you plan to take
Whether each risk has been eliminated, reduced, or accepted
The opinion of your D.P.O. if you have one
Whether you need to consult the I.C.O.

Any time the same infrastructure uses a defined set of data to deploy a different task, the senior manager must review the D.P.I.A. to evaluate whether different lawful bases are applied.

The same practice should be applied when a third party is implementing the project.

After the project has deployed, it is fundamental to implement monitoring activities to define:

The frequency, proportional to the impact an incorrect output may have on individuals
Review the statistical accuracy measures regularly to mitigate the risk of concept drift
Periodically updates and reviews the statistical accuracy

Because AI is an automation process of decision, it should always be considered if the outcome is free or not by bias and discrimination. In case correct them as soon as possible.

In conclusion, deploying project-embedded AI technology has the same meaning regarding the applicable legislation, but do not present the same challenges.

The volume of data, the arbitrary of the decision, and the unpredictability of the outcomes are aspects that required particular attention, and they cannot be lived only of the hand of the technical person.

At this stage, the technology steel needs constant monitoring of its usage to ensure that all aspects remain in control of the senior manager responsible

May 2020

by Daniele Lupi

Giuristidimpresa.co.uk

Global edition

Data Privacy

Data Privacy

Giuristidimpresa.co.uk

Global edition

Data Privacy

Data Privacy

This website uses cookies.