Data protection by design when procuring intelligent solutions
The principles of data protection by design have been around for a long time, but they were not legally established until the GDPR entered into force in 2018. This requirement is based on the idea that a system must have data protection measures built-in from the beginning in order for the user of the system to be able to comply with data protection rules. Tacking on measures to “fix” data protection problems after the fact would, in the long run, result in solutions that are less data protection-friendly than those that have these features built in by design.
Who is responsible for ensuring data protection by design?
Compliance with Article 25 requires that programmers and developers implement suitable technical solutions. Only those who are developing the solution are able to include good data protection measures from the beginning. Even so, the responsibility for compliance with this requirement does not rest with the software provider or the data processor, but with the data controller.
See the Data Protection Authority’s guide to data protection by design.
This is how the Regulation works. The data controller is responsible for the processing of personal data, see Article 5 (2). Responsible data controllers will demand good data protection measures when they purchase products. In this way, providers who can guarantee good data protection by design will have a competitive advantage. This requires that the data controller purchasing the machine-learning solution is knowledgeable about
- data protection by design
- machine learning
- contracts
- procurement rules (for data controllers who are subject to the Public Procurement Act)
What should be included by design?
The system should include the following by design:
This includes everything, from abstract requirements, such as fairness, to more practical requirements, such as procedures for deletion. So, when has a provider been able to include data protection by design? The law does not explicitly define what are considered appropriate technical and organizational measures. The Data Protection Authority’s guide to data protection by design emphasizes that the measures must be capable of effectively ensuring data protection and provides examples of key elements for each data protection principle. The data controller must decide which measures and which levels of measures are required, by considering the following factors:
- the status of technological development
- implementation costs
- the nature, scope, purpose and context of the processing
- the risk to the rights and freedoms of data subjects – likelihood and severity
In other words, Article 25 of the GDPR provides that the context of the processing shall be a determining factor in how and to what degree data protection is to be included in the solution by design. All of the GDPR’s provisions shall nevertheless apply, but how a specific procedure for deletion is built into the solution must be determined on the basis of what is the most suitable approach.
Data protection by design and machine learning
Machine learning may complicate the assessment of which measures would be best suited for ensuring data protection by design. The reason for this is that machine learning challenges data protection and privacy in the following ways:
- Most buyers acquire algorithms that are already developed (off the shelf) and repurpose them for various purposes
- Training machine learning solutions requires vast quantities of data
- We still do not know very much about how machine learning may affect individual rights and freedoms and have unexpected consequences, such as
- discrimination
- lack of transparency
Machine learning algorithms may be extremely opaque. It can be difficult to explain the logic on which the results of a machine learning solution are based – this is often referred to as a black box problem.
If the risk to the data subject’s rights and freedoms is related to discrimination, for example, the measure must reduce the risk of discrimination. This means the algorithms must be designed so as to prevent discrimination.
Machine learning solutions are often bought as so-called “off-the-shelf” products. This means that the solution has been developed by someone other than the data controller for the particular use of the tool. The buyer will be responsible for acquiring a solution that has data protection by design. The buyer (data controller) may appoint a data processor to guarantee compliance, but in order to select appropriate measures at the appropriate level, the data controller must also understand the impact the solution will have on the privacy of the data subjects. Because machine learning solutions are often complex and difficult to explain, this will pose a greater challenge for data controllers than technology that does not include machine learning.
In the sandbox project, we wanted to highlight how data protection by design could be ensured in connection with the procurement of intelligent solutions. In order to gain an understanding of what public bodies need help with, we chose to conduct a case study to map the landscape and prepare some basic recommendations.
How did we prepare our recommendations?
In order to clarify which information public-sector bodies need to make good decisions in their procurement of machine learning solutions, the project group decided to conduct an interview with a public body, NVE. The purpose of this interview was to establish an overview of the needs public-sector bodies have for information about data protection by design. Based on this interview, we chose to prepare recommendations with points public-sector bodies can follow up on. NVE was also invited to provide feedback on an early draft of these recommendations, to see if they met their needs. In addition to NVE, representatives from the police service and Simplifai also provided feedback on the draft recommendations.
NVE was chosen as a case study because they had already been in contact with Simplifai about their archive system. As a public-sector body, NVE is an interesting choice for a case study because their activities do not entail broad processing of personal data, compared to, say, a municipal authority. Even so, NVE does need to process some personal data about its employees, and DAM would be part of that. NVE is also a public body with experience of developing its own proprietary solutions, which means they would have an interesting perspective in a procurement situation such as this one.
Feedback from NVE after the interview indicates that there is a considerable need for guidance on data protection in connection with the procurement of intelligent solutions. The other workshops in the sandbox about data protection by design also confirm the need for guidance.
Information about data protection by design is also provided by the Data Protection Authority in its activities outside the sandbox – in supervisory activities and in monitoring compliance with the GDPR. We see two factors, in particular, driving the considerable need for information.
Why is the need for information so great? Our findings
One interesting finding is that data protection by design seems to be exclusively associated with information security. Information security is a field that has been explored much more than data protection, so that may be part of the reason. We see a demand for advice on how organizations can request information and make demands beyond information security when they check a solution’s data protection by design. These experiences are shared by the Privacy Commission.
Machine learning and artificial intelligence are so complex that most organizations without specific technical expertise lose sight of which measures are considered “appropriate”. The difference in technological expertise between the developer of the tool and the data controller is even greater in machine learning than it is in other contexts. This gives providers a key role in providing information about the product. Our perception is that customers so far have only to a very limited degree requested data protection by design for off-the-shelf products based on machine learning.
Background for choices
Because both data protection by design and machine learning are complex and, to a certain extent, new concepts, our recommendations are somewhat generalized for the time being. Examples of the types of requirements that can be demanded of providers of machine learning tools have nevertheless been included, because they are especially relevant for machine learning solutions.