Recommendations for data protection by design in the procurement of solutions based on machine learning
Build competence
Employee training is a key measure for ensuring data protection by design in practice. Build employee competence with regard to data protection by design, machine learning, contracts and procurement.
Both the data security officer and the data protection officer can be good resources in procurement processes involving machine learning solutions, and should be involved as early as possible.
Here is a list of useful sources:
Data protection by design
This guide covers all of the data protection principles and provides simple recommendations for how these can be included by design. The guide provides a good description of the requirement for data protection by design.
This guide goes through the different stages of the development process and describes how data protection can be included by design. It is written for people with technical knowledge, but everyone can benefit from the practical recommendations for how to include data protection by design.
Machine learning
Easily understandable report describing artificial intelligence and how it relates to privacy.
A comprehensive and well-designed online course on artificial intelligence for those wishing to learn more about the mechanisms behind it.
Comprehensive report on artificial intelligence from a technical perspective. This report also covers potential challenges posed by machine learning.
How to define requirements and perform evaluations in a procurement process
General information about procurement, especially how a public procurer can go about collecting information about products and defining requirements.
Consider: Is machine learning the most appropriate approach?
Which need is the machine learning tool intended to meet? Our recommendation is to consider the context in which you will be processing personal data and to consider whether a rule-based tool can achieve a more privacy-friendly solution than a machine learning tool. If, for example, you are using the tool to support decision-making processes that affect citizens, you must define different requirements for the tool than if you are using it to make internal decision-making processes more efficient.
Rule-based tools
Rule-based tools are tools where the algorithm is static and based on fixed rules, as opposed to algorithms that are dynamic and make predictions based on patterns in the source data.
As an aid in this comparison, we refer to some challenges posed by machine learning tools, as highlighted in a report by the Privacy Commission of 26 September 2022.
First, machine learning algorithms demand large quantities of data to develop an accurate model. (Read more about that on page 17 in NDPAs AI report.) Consistent principles in the GDPR include the principle of data minimisation, which entails limiting the collection of personal data to what is necessary to accomplish the purpose of the processing, and the principle of purpose limitation, which entails limiting the use of personal to the purpose for which it was originally collected. Furthermore, the provision concerning data protection by design specifies that the data controller shall ensure that, “by default”, only personal data necessary for the purpose of the processing is processed, see Article 25 (2).
Second, the Commission emphasizes that the source data collected and used to train a machine learning algorithm may contain errors and defects. If such defects are present in the source material, the result, i.e. the predictions of the machine learning algorithm, will be affected by these defects. This means the organization must be very conscious of which data is used to train a machine learning algorithm. Depending on the context of the processing, the data material may also be outdated, and therefore misleading. In order to maintain good data protection throughout the lifespan of the solution, it will therefore be necessary to make regular adjustments to maintain the accuracy of the predictions.
Furthermore, machine learning algorithms will rarely allow for sufficient transparency and predictability. Machine learning solutions are generally not transparent, according to the Privacy Commission’s report. In many cases, machine learning algorithms will also be dynamic. This means that their logic may change, even after the algorithm has been implemented.
Finally, the Commission points to the risk of predictions from machine learning solutions being used without critical reflection. This issue was also emphasized in the exit report for the NAV sandbox project. In practice, it could mean that what was intended as a decision-support system in reality becomes an automated decision-making system.
The Data Protection Authority would also like to emphasize the importance of the organization taking into account which of the data subjects’ rights and freedoms the data controller is responsible for protecting, and whether the data controller is still able to uphold this obligation if a machine learning solution from an external developer is implemented.
If you conclude that a machine learning tool is the best solutions for your needs, what steps can you take to define requirements for the product?
Ask questions, dig deeper and make demands!
We have some recommendations for how you, as data controller, can request the documentation you need to assess data protection by design in the various solutions offered.
Get the solution explained in a way you understand
Machine learning solutions can be very complex, and it is important that everyone who will be using the solution understands how it works. In order to meet the requirement for data protection by design in the procurement phase, it is important that those involved in assessing the bids submitted understand how data protection can be safeguarded when it is used.
We recommend that those who are in the process of procuring a machine learning solution request an easy-to-understand, detailed description of what the solution actually does.
Request to see data flows and processing records
A provider acting as data processor must be able to account for which processing activities they perform on behalf of the data controller, see Article 30 (2). As data controllers, you can ask to see this processing record before you enter into a contract for procurement of a machine learning solution. Even providers who are not data processors should be able to account for which types of personal data will be processed by the solution. The processing record is essential for gaining an overview of how the processing actually takes place. It would also be useful to view the processing record in light of the accessible description mentioned above. It could also be relevant to get a description of how the collected data moves in the solution, i.e. the data flow.
Ask how the transparency requirement is handled
An even bigger issue related to the use of machine learning solutions is how to ensure one is able to explain decisions made with decision support from a machine learning solution.
As described above, the lack of transparency is a recurring issue we encounter in connection with machine learning solutions. Even so, the organization has an obligation to inform the data subject. This obligation to provide information extends to the underlying logic in certain types of automated decisions.
It is therefore important to ensure, before a contract is signed, that there are ways to present how the algorithm weighs variables and how accurate the algorithm is. The latter can, for example, be handled by the solution indicating how likely it is that the prediction is accurate.
Ask what the mechanisms are in place to identify and mitigate algorithmic bias
Machine learning creates some new problems related to system ethics. Potential algorithmic bias may challenge the principle of fairness in Article 5 (1) (a) of the GDPR.
See also the Data Protection Authority's AI report, p.15 (pdf)
The Privacy Commission points out that this type of bias may occur when there is a lack of transparency in the solution. Furthermore, these biases will be exacerbated if the solution is used without critical reflection or is fed incorrect data.
In connection with procurement of a machine learning solution, it may be a good idea to find out whether the solution has mechanisms in place to identify potential bias, how often the algorithm should be adjusted and how. If it is possible to identify the situations in which the algorithm may be less accurate, it will be easier to implement appropriate measures to reduce the consequences of this bias. Another alternative would be to retrain the algorithm as soon as its accuracy falls below a pre-defined tolerance.