Logo and page links

Main menu

Illustration of teenager getting a warning on the phone (Made with help from the GAI tool DALL-E.)

The Norwegian Police University College, exit report: PrevBOT

Artificial intelligence (AI) can identify meaning in text and say something about the state of mind and traits of the person writing it. The Norwegian Police University College, which is leading the PrevBOT project, wants to explore the possibility of creating a tool that can automatically patrol the open internet with the aim of detecting and preventing the sexual exploitation of minors. 

Summary

The Norwegian Police University College (PHS) wishes to utilise chat logs from criminal cases, in which the judiciary has concluded that grooming has taken place, in order to train an AI tool to identify grooming conversations. The aim is for the tool to be able to identify online social arenas that require extra vigilance and – not least – that it is has the ability to flag specific ongoing conversations, allowing the police to stop grooming incidents and prevent abuse.

This is currently a research project at the concept stage. In this sandbox project, we have therefore primarily focused on whether, for research purposes, it is possible to develop PrevBOT within the framework of current legislation and ethical guidelines.

Summary of results

  • Legal: The sandbox project first addressed some general but fundamental issues relating to the police’s use of personal data for the development of artificial intelligence. However, the focus of the assessment has been on whether personal data may be processed in connection with the PrevBOT project. These assessments may also be relevant to other projects and areas of the policing.
  • Ethical: PrevBOT is an example of a project that has such honourable intentions that there is a danger of the ends justifying the means as the project progresses. The PHS wants PrevBOT to be not only a tool that is legal, but that also meets the requirements for responsible AI. The sandbox project has made a preliminary analysis and considers it ethical to commence research based on criminal case data. We also point out key ethical dilemmas and important values when it comes to development.
  • Technical: The PHS would like to build PrevBOT on a Tsetlin machine (TM). TM is a relatively new method of machine learning, and its strength lies in categorisation and explainability. TM is expected to offer a higher level of explainability than neural networks, meaning it is easier to receive an answer as to why the tool reaches its conclusions. A high level of explainability will be important for transparency and trust in a tool that could potentially be perceived as invasive in personal conversations. The pivotal question in the research project is whether the technological features the PHS would like to integrate into PrevBOT – each of which has been proven possible in neural networks – can actually be transferred to and combined in a Tsetlin machine. We have not been able to see how a Tsetlin machine functions in practice in the sandbox project, but we have made a theoretical assessment and are generally hopeful that in a few years, it will be able to contribute to more sustainable AI tools that incorporate good explainability.

Effective and ethical prevention of grooming using AI?

Watch the recording of the PrevBOT report launch webinar.

The way forward

The sandbox project has assessed and outlined how the PHS can legally conduct research into such an AI tool. However, a green light for PrevBOT research may be of little value if the tool being researched and developed will not be lawful to use in practice. Once in use, such a tool will inevitably need to process (sensitive) personal data. Depending on how it is implemented, its use could be perceived as rather intrusive to the privacy of victims and perpetrators, as well as to random individuals whose conversations are analysed by PrevBOT while they are online.

It would probably be wise to establish a plan early on for assessing the legality of using such a tool in practice, and that could definitely be the topic of a new sandbox project.

The PrevBOT project is still at an early stage, and the way forward depends on many decisions yet to be made. From a data protection perspective, it will be particularly interesting if the ambition is maintained that it will be a tool of prevention used to intercept attempts at grooming. The PrevBOT project is now clear that this is the goal. However, during the transition from idea to ready-to-use AI tool, there are forces that may seek to influence the project, giving the tool the capability to collect evidence against and pursue perpetrators. The Norwegian Data Protection Authority recommends that the project identifies at an early stage the uses of PrevBOT it considers unethical and undesirable, and strive during the development phase to prevent such uses from being pursued.

The desire for freedom and the desire for security are often presented as conflicting goals. The PrevBOT project is an excellent example of freedom, security and privacy being interdependent – and that it is all about finding the right balance. Minors have a right to autonomy and a private life , but without a certain level of internet security, they would not be able to exercise their autonomy and freedoms. As the tool is gradually designed in more detail, an important part of the project will be to find this equilibrium.

The sandbox project has identified several ways in which PrevBOT can help make the internet and everyday life safer for vulnerable groups. PrevBOT may end up being not just one tool, but the basis for a number of different measures, which together provide effective protection against online grooming.

See the final chapter for more information about the way forward.

What is the sandbox?

In the sandbox, participants and the Norwegian Data Protection Authority jointly explore issues relating to the protection of personal data in order to help ensure the service or product in question complies with the regulations and effectively safeguards individuals’ data privacy.

The Norwegian Data Protection Authority offers guidance in dialogue with the participants. The conclusions drawn from the projects do not constitute binding decisions or prior approval. Participants are at liberty to decide whether to follow the advice they are given.

The sandbox is a useful method for exploring issues where there are few legal precedents, and we hope the conclusions and assessments in this report can be of assistance for others addressing similar issues.

About the project

For most people, ChatGPT provides evidence that machines are capable of identifying meaning in written texts. But the technology has been able to understand the content of text for a number of years, and even detect the emotions and traits of the person writing. 

Commercial stakeholders are already using this technology in marketing and customer contact, for everything from tracking a company’s online reputation or optimising a chatbot, to behavioural marketing or keeping a user engaged as long as possible in a gaming environment or on social media (SOME). But what if such technology could also prevent sexual exploitation and child abuse?

Professor Inger Marie Sunde at the Norwegian Police University College (PHS) felt inspired to consider that idea a few years ago. Part of the inspiration was the Sweetie project at Leiden University, in which a chatbot was developed in the form of a computer-generated, ten-year-old girl from the Philippines. ‘She’ was able to observe and automatically participate in chat rooms, and her purpose was to prevent sexual exploitation of children via webcams. The project reportedly detected over 1,000 online abusers.

Professor Sunde has since written two academic articles, co-authored by her colleague Nina Sunde, on how a PrevBOT could prevent attempts to exploit children.

Read: Part 1 - The Theoretical and Technical Foundations for PrevBOT

Read: Part 2 – Legal Analysis of PrevBOT

The PrevBOT project is organised as a research project consisting of several work packages, where the role of the PHS is to lead the project and the various stakeholders. There are a number of stakeholders involved in the work, including the Police IT Unit (PIT), the Centre for Artificial Intelligence Research (CAIR) at the University of Agder (UiA) and Kripos national criminal investigation service.

Abuse – from offline to online

Child Sexual Exploitation and Abuse (CSEA) did not originate with the internet, but the digitisation of society has given abusers new and easily accessible places where children come together, often without parental supervision. So with the internet, smart phones and the advent of social media and chat rooms, it is now easier for sexual abusers to make contact with minors to exploit. This is reflected in the dramatic increase in reports of CSEA.

The increased prevalence of social media, messaging and live streaming apps in recent years has led to a dramatic increase in reports of this type of crime.

- EPCAT International (2020)

It is often in chat rooms that abusers make contact with potential victims. Gaming platforms and social networks with messaging features are also places where abusers initiate contact with children. Once contact and interest are established, the abusers will often try to manoeuvre the conversation and further grooming to hidden forums, in the form of direct messaging.

Grooming is a manipulative process in which an adult or a person somewhat older than the minor builds trust and establishes a connection for the purpose of sexually exploiting them. It often includes virtual communication through social media and internet platforms. The groomer may use various means such as compliments, gifts, manipulation and threats to make the victim feel safe and at ease before turning the conversation towards topics of a sexual nature. 

Purely online abuse

The digital age has also brought with it a new form of sexual exploitation: purely online abuse. The abuse often takes place in the child’s own bedroom while the parents remain oblivious to the ongoing crimes. Children are coaxed into performing sexual acts in front of the camera, either alone or with someone else. This kind of violation can feel just as severe, and is both prolonged and intensified by the knowledge that a recording most likely exists somewhere ‘out there’. Such exploitation material is often shared on the dark web. 

Public health issues

‘Highly active offenders have often been able to operate undisturbed for extensive periods of time, allowing them to make contact with perhaps several hundreds of victims before being stopped,’ according to a report from the Norwegian Centre for Violence and Traumatic Stress Studies (NKVTS) and NOVA Norwegian Social Research at OsloMet. The report was released in January 2024, and its sources speak of a tenfold increase in the problem in the past decade. There are now more cases, they are larger in scale and their content is more severe.

See also NRK’s review of the report: ‘Grovere, hardere, kaldere' (‘Cruder, rougher and more chilling’ – in Norwegian only)

Experience from Norwegian criminal cases shows that, using the internet, perpetrators can commit abuse against a huge number of children at the same time. This was rarely the case before the internet, but with it, we have seen cases such as a 27-year-old who abused 270 children, predominantly boys under the age of 14, over a period of two-and-a-half years.

The Ombudsperson for Children referred to it as a public health problem when the report ‘Alle kjenner noen som har opplevd det’ (‘Everyone knows someone who has experienced it’ – in Norwegian only) was published in 2018. A survey conducted by NKVTS in 2020 showed that 4.5 per cent of the young participants had experienced sexual exploitation from an adult in the past year, while the National Safety Survey 2022 showed that 12 per cent of girls and 2.6 per cent of boys between the ages of 16 and 19 had been exposed to digital sexual violence in the past year.

It should nonetheless be pointed out that most sexual assaults are committed by individuals of a similar age or close relatives, something PrevBOT, as it is conceived, will be unable to prevent. However, with a sharp increase in online grooming and purely online abuse, an effective weapon to combat the issue will have a noticeable impact on the overall picture.

International cooperation

An international cooperation has been established in an attempt to combat online CSEA. In 2017, the EU made online CSEA one of the top ten priority issues within organised and serious international crime. Despite these efforts, the number of cases and victims is on the rise. As such, it is not sufficient just to pursue abusers in an attempt to tackle the issue through investigation.

Preventive measures are essential.

How will PrevBOT work?

‘PrevBOT’ stands for ‘Preventive Robot’, a reference to its preventative purpose and the artificial intelligence implemented in the robot technology.

PrevBOT can be present in chat forums, but in its basic form, it is not a generative chatbot that can participate autonomously in conversations. Whether such an interactive function should be integrated in the future needs to be assessed in relation to, for example, the legal boundaries for infiltration and entrapment as well as ethical considerations. The sandbox process has treated PrevBOT as a passive observation tool, and it is therefore not to be considered a chatbot.

The fundamental feature a PrevBOT requires is the ability to monitor conversations in open online forums and identify, in real time, conversations in which grooming is taking place. By extracting statistics from PrevBOT, the police will also be able to identify places online where there is an increased risk of grooming. The robot should, in other words, be able indicate problematic areas of the internet and point out individuals.

How can grooming be identified?

There are essentially three main internal functions the bot needs to perform:

  1. Detect grooming language

    The bot must recognise words and phrases relating to sex talk, and not only in the lexical sense. It also needs to remain up to date in terms of slang and code words. With effective and continuous training and updating, it may be able to recognise the signs of a grooming conversation before the language becomes sexually explicit.
  1. Detecting fake profiles

    The robot must be able to estimate the gender and age of the person chatting. Many abusers pretend to be something other than what they really are (including minors). By estimating gender and age, the robot can detect conversations in which there is a significant difference in age. This will allow PrevBOT to detect whether there are adults in forums in which the other users are young, or vice versa, if a minor gains access to an adult-only chat room.
  1. Sentiment analysis

    The bot must identify the emotions of the individuals chatting. Response time, typing speed, the language used and writing style can reveal, for example, whether a person is aggressive/persistent/impatient, even if the written content suggests a calm and relaxed demeanour. This can be a sign that a user’s intentions differ from the intentions they express.

The bot does not necessarily have to suspect deception about age/gender and emotion for the conversation to be classified as grooming, but together, these three detections will provide useful input when assessing a conversation.

When PrevBOT classifies a conversation as potential grooming, the conversation is flagged. At that point, the idea is that humans take over and decide whether or not there are grounds for intervention and how it should be implemented. In other words, PrevBOT is intended as a decision support tool.

The manner in which the police intervene in flagged conversations has not been determined. The original idea of the project was that the groomer somehow receives a warning and the grooming attempt is somehow intercepted. The police already have online patrols that monitor and have experience in such areas and the hope is that PrevBOT will serve to increase their capacity.

The sandbox project has discussed whether it would be just as effective if the ‘victim’ also received a message, or the possibility that only the ‘victim’ receives a warning. For a vulnerable minor, it may be disconcerting for a conversation to be suddenly ended without explanation. We did not conclude what would work best but recommend that the PrevBOT project test the alternatives – and at any rate take into account the ‘victims’ – when determining the manner in which grooming attempts are intercepted.

Detecting sexualised language

A significant amount of research has been conducted internationally on grooming linguistics. Many of the studies are based on R. OʼConnellʼs five-step model of online grooming processes. As the model indicates, it may only be in the fifth step that the conversation becomes explicitly sexual. However, it is possible to recognise grooming attempts during the previous four steps. The Risk Assessment step can be particularly revealing. Recent research suggests that online groomers are now more impatient, perhaps cautious, and carry out risk assessment at an earlier stage of the process.

5 steg_eng komplett.png

With machine learning (ML), natural language processing (NLP) and neural networks, models can be trained to recognise the signs of a grooming conversation. They are trained using reference data from conversation logs in which grooming was retrospectively found to have occurred.

Stylometry is the study and analysis of linguistic style and writing patterns that allows vocabulary, sentence lengths, word frequency and all other quantifiable text characteristics to be evaluated. For example, it may be interesting to see how often a question is asked in a conversation. Researchers Borj and Bours at the Norwegian University of Science and Technology (NTNU) have had promising results in their attempts to recognise grooming conversations. Using various classification techniques, they succeeded in detecting abusers with up to 98 per cent accuracy.

Detecting deception

Author profiling involves analysing texts to identify an author’s gender, age, native language, personality traits, emotions and similar characteristics. Experiments show that such profiling can be impressively accurate, especially if the categories are broad – e.g. is the writer a child (under the age of 18) or an adult (for example, over the age of 25) – and if the model is trained in specific topics (e.g. chat room conversations) rather than a broader range of areas.

If the person chatting in the text or writing a user profile pretends to be something other than the categories they are assigned by author profiling, it may indicate that grooming is underway.

Interpreting emotions

Sentiment analysis uses NLP and machine learning techniques to identify and extract subjective information from text data. Sentiment analysis uses artificial intelligence that reads what is written and sorts text into categories of sentiment. A simple example would be a company that monitors how its products are reviewed. Such an analysis can categorise text as ‘positive’, ‘negative’ or ‘neutral’, or sort text to an even more granular level.

Sentiment analysis is used in many different areas. The entertainment industry uses it to measure audience reactions to TV series when assessing whether to conclude or extend a production. In politics, it is used to analyse people’s reactions to political initiatives and events, and in the financial sector, it is used to capture trends in the financial market.

The above examples relate to group-level sentiment analysis, but it can also be used at an individual level. The same methods are used when social media platforms track your activity – what you like, what you comment on, what you post and where you stop when scrolling. The better they know your emotional life, the more effectively they can target you with ads and content.

Keystroke biometrics

Today’s technology can do more than just categorise an author, such as with author profiling, and identify the author’s true emotions. It can actually identify an individual author using keystroke biometrics, based on the idea that an individual’s use of language is so unique that it is like a textual fingerprint. There is a preliminary plan to include this feature in PrevBOT, providing it with the ability to recognise previously convicted sex offenders who have become active again online. This feature has not been discussed in the sandbox project, however.

Explainable Tsetlin

The PHS envisages building PrevBOT on a Tsetlin machine (TM). The benefit of a Tsetlin machine is that it has a higher level of explainability than neural networks. In a project like PrevBOT, where people are to be categorised as potential abusers based on (in most cases) open, lawful, online communication, it will be important to be able to understand why the tool reaches its conclusions.

A detailed description of the Tsetlin machine can be found in Chapter 6 of this report.

PrevBOT en skisse_eng komplett.png

The figure above illustrates PrevBOT’s decision-making process. PA stands for problematic areas, while PI stands for problematic individuals. The illustration is taken from Sunde & Sunde’s article from 2021. The discussions in the sandbox have assumed that the capability of tracing textual fingerprints will not be included as a feature in PrevBOT.

Starting an investigation is given as an alternative in the figure, but the PrevBOT project has stated that the option to start an investigation would be most relevant in conjunction with the keystroke biometrics function, which is no longer applicable.

Goals for the sandbox project

PrevBOT raises a number of questions and some clear ethical dilemmas. A fundamental legal question is whether it is even possible for the PHS/the police to develop and use such a tool without breaching data protection regulations.

Will the answer depend on how PrevBOT is intended to operate and which features are integrated or not? Or will the answer depend on which data, including personal data, is used to train such a robot? And if it is the latter, how should that data be processed when developing the tool, and where should it be stored when it is in use? Can PrevBOT – developed with the use of personal data – be taken into use in police online crime prevention activities?

The number of questions such a tool raises does not diminish as we move from a purely legal to an ethical perspective. Is it ethical to monitor ‘everyone’ in order to catch a few perpetrators (even if they are becoming increasingly numerous)? How is it most ethical to calibrate the PrevBOT in the spectrum between flagging a conversation as early as possible with the risk of mislabelling an innocent individual – or delaying flagging a conversation until the grooming is more obvious with the risk of letting the perpetrator slip away and lead the victim into a private conversation? Also, if PrevBOT becomes a purely preventive tool that simply frightens off abusers and alerts potential victims, would it be ethical if the police did not to attempt to apprehend a potentially dangerous individual, if they had received information that could identify the perpetrator?

Topics and delimitations

With the project at such an early phase and with so many directional choices to be made, it was simply not possible to make an overall assessment of the PrevBOT project.

For assessing the legality of PrevBOT, the sandbox project was delimited to the development phase. The most central discussions revolved around the confidential text data from Norwegian criminal cases that involved online abuse, which the PrevBOT project wishes to use as training and test data in the development phase. The project already has a small amount of such data by virtue of a permit from the Director of Public Prosecutions, cf. the Police Databases Act Section 33, which allows the duty of confidentiality to be lifted for research purposes.

The sandbox project also aimed to identify and partially discuss some of the ethical questions surrounding the start-up and early phase of PrevBOT’s research project, so that they have some guidance when setting the course for the tool’s development.

The objectives were specified as follows:

  1. Clarify legal requirements for processing textual data used as evidence in completed criminal cases, during the development phase of PrevBOT.
  2. Specify what ‘responsible AI’ means when the police use the technology to analyse communication on the internet, with perhaps particular focus on explainability.

Legal: General information about the police’s processing of personal data for the development of AI

The development of AI consists of several stages, including the development, use and continued learning phases:

tre fasr_eng komplett.png

The presentation of legal considerations will focus on the police’s processing of personal data in the first stage, the development stage. During this stage, we can envision two different situations in which personal data is processed by the police for the development of artificial intelligence, one of which would involve processing data for research purposes. Development can thus take the form of research, which means that ‘research’ and ‘development’ are not necessarily mutually exclusive terms. The report will primarily focus on the processing of personal data for research into the development of artificial intelligence in the PrevBOT project (situation 2 in the table).

We will, however, begin with a general introduction to the legislative landscape that applies to policing, in an attempt to provide some initial clarifications for situation 1 in the table. Firstly, it is important to clarify which regulations apply, i.e. the Police Databases Act or the General Data Protection Regulation. Secondly, since there is talk of using investigative data/criminal case data in the development of the artificial intelligence, the question arises as to whether the law permits such processing beyond the original purpose.

Finding the right regulations

The General Data Protection Regulation (GDPR) is incorporated into Norwegian law through the Personal Data Act, meaning that the regulation applies as Norwegian law. The main rule is that all processing of personal data is regulated by the Personal Data Act. This applies unless the GDPR makes exceptions to its scope of application. The Personal Data Act Section 2 governs the substantive scope of the Act and states that in the event of a conflict, the provisions in the General Data Protection Regulation take precedence over provisions in any other statute that regulates the same matter, cf. Section 2 of the EEA Act.

The police’s processing of personal data is primarily regulated by the Police Databases Act, which implements the Law Enforcement Directive (LED), supplemented by the Police Databases Regulations. The rules set out in the GDPR do not apply in the area covered by the Law Enforcement Directive, cf. the GDPR Article 2(2)(d) and the Law Enforcement Directive Article 1. In other words, the legislator’s intention is that the processing of personal data falls under one or the other regulatory framework.

When considering the police’s processing of personal data for the development of artificial intelligence, the scope of the two acts in question must be considered to determine whether processing falls under the Personal Data Act/GDPR or the Police Databases Act.

In this assessment, we will begin by looking at the exemptions from the scope of the GDPR set out in Article 2(2)(d). The exception states that the GDPR does not apply to the processing of personal data conducted by the police:

In the event that the police process personal data for purposes other than those mentioned, the processing will be regulated by the GDPR. It is therefore the purpose of the processing that determines which regulations apply to the police’s processing of personal data.

There is little doubt that the development of tools that use artificial intelligence will help the police carry out their social mission and thus allow them to utilise their resources more effectively to combat crime. However, the exceptions listed in the GDPR Article 2(2)(d) are, according to its wording, directed towards more traditional and typical ‘police tasks’. The European Court of Justice has also ruled that the exception in the GDPR Article 2(2)(d) shall be interpreted ‘strictly’.

In the view of the Data Protection Authority, the exemptions in the GDPR Article 2(2)(d) are aimed at the police’s crime-fighting activities. Given that the exceptions are to be interpreted strictly, it is difficult to construe how the development of artificial intelligence tools could fall within them. The interpretation is also considered to be in line with the definition of ‘police purposes’ set out in the Police Databases Act Section 2(13), which encompasses the police’s activities against crime, including investigation, preventive efforts and the activities of the uniformed service, and the police’s service and assistance functions and keeping of police logs. It is assumed that this definition does not cover technology development as such either, even if the purpose of such development is creating a tool to aid activities against crime.

The systematics of the legislation are also significant to their interpretation. When it comes to the rights of data subjects, the GDPR provides a stronger safeguard than the Police Databases Act. According to general data protection principles, information and access are key rights for the data subject, enshrined in GDPR Articles 13, 14 and 15. These rights are not as strongly safeguarded when personal data is processed for crime-fighting purposes, due to the specific considerations that apply in the area.

Therefore, in the view of the Data Protection Authority, the use of personal data for the development of artificial intelligence in policing will in general be regulated by the GDPR, because such processing is unlikely to fall within the exceptions to the material scope set out in GDPR Article 2(2)(d).

On further processing of personal data for a new purpose

If personal data has been collected by the police for ‘police purposes’ in line with the Police Databases Act, and the police wish to further process the data for the development of artificial intelligence – which is a different purpose to ‘police purposes’ – the question arises as to what conditions must be met to carry out this processing. As illustrated above, this will generally be determined by the rules of the GDPR. 

The entity that discloses/makes personal data available must:

  1. have the authority to disclose the personal data. There may be legal prohibitions that prevent personal data from being disclosed for processing for another purpose. For example, the Police Databases Regulations Chapters 8 and 9 each set out restrictions and conditions for access and disclosure of information, respectively.
  2. have a legal basis for processing in order to disclose personal data,
  3. carry out a compatibility assessment of the purposes. Since the personal data was originally collected for police purposes, the question of whether using the data to develop an artificial intelligence tool to help fight crime is a purpose compatible with the original purpose of collection, cf. Article 6(4). According to the provision, an assessment must be carried out that takes into account, among other things, the factors specified in letters (a) to (e). This means that the provision, in certain cases, allows for personal data to be further processed for a new purpose.

The entity that will further process the personal data for a new purpose must have a legal basis for processing. If it is the same data controller that both discloses/makes available and further processes the personal data for a new purpose, then it is that entity that carries out all of the above assessments.

Legal: Data flow in the PrevBOT project

In this chapter, we will approach the legality of this specific project, starting with an exercise from which ‘everyone’ can learn: gaining an overview of the data flow in the project.

In order to conduct a legal analysis, it is essential to gain an overview of the data flow in the project. In connection with the development of the algorithm in the PrevBOT project, two main groups of data are processed:

  1. One group is retrieved from publicly available datasets from the National Library of Norway. The purpose of processing this data is to train the AI model in the Norwegian language.

The Data Protection Authority understands that this language training will take place within the work package at the Centre for Artificial Intelligence Research (CAIR)/University of Agder (UiA) as part of the PrevBOT project. The PHS itself assumes that the use of this data will not trigger any data protection issues. Due to the scope of the project, we have narrowed the report to this group of data.

  1. The second group of data consists of information from confidential chat logs obtained from Norwegian criminal cases (criminal case data).

The chat logs used as evidence in criminal cases consist of transcripts of chat conversations between a perpetrator and victim in which grooming has taken place. A small number of relevant cases were identified in the pre-project ‘Nettprat’ (online chats). For more information about the specific collection of data from chat logs, see page 20 of the report from the Nettprat project.

The chat logs may contain a variety of personal data, depending on what the participants in the conversation share. It is also conceivable that the logs may contain metadata that contains personal data.

Based on the information in the chat logs, the algorithm may be able to capture personal data, even if that data is not explicitly part of the training data. For example, it is conceivable that the algorithm could capture a person’s textual fingerprint, which can often be considered personal data. In such cases, it may be possible – at least theoretically – to re-identify a person with a certain degree of probability, even if no directly identifying personal data is included. The PHS states that such identification requires the existence of a reference database with textual fingerprints. According to the information provided, PrevBOT will not have this function and such a database will not be created.

The personal data processed may pertain to the following categories of data subjects:

  • The victim in a criminal case
  • The perpetrator in a criminal case
  • Third parties that may be mentioned in a chat conversation

When processing chat logs, it is conceivable that the following processing activities related to personal data may be conducted:

  • Disclosure of chat logs from various police districts to the Police IT Unit
  • Removal (‘cleansing’) of personal data from chat logs at the Police IT Unit
  • Discourse of chat logs from the Police IT Unit to CAIR/UiA (provided that the personal data is not completely anonymised)
  • Data preparation/structuring at CAIR/UiA (provided that the personal data is not completely anonymised)
  • Training of the algorithm at CAIR/UiA (provided that the personal data is not completely anonymised)
  • Analysis at CAIR/UiA (provided that the personal data is not completely anonymised)

The Police IT Unit (PIT) has a supporting role in the project and receives a copy of the chat logs directly from local police districts. PIT ensures that the confidential chat logs are securely stored and are not exposed to anyone other than those who have lawful access to the data. Before the chat logs are made available to CAIR at UiA, PIT must remove identifying information about the perpetrator and the victim. The PHS considers that this information is not in any case relevant to the project. At PIT, chat logs must also be machine cleansed, so that names, addresses, phone numbers and any other directly identifying information are redacted and replaced with ‘XX’.

Personal data legislation does not apply to anonymous data. Data is considered anonymous if it is no longer possible to identify individuals in the dataset using tools that could reasonably be assumed to have been used.

About anonymisation

There are many pitfalls when anonymising data and the Data Protection Authority generally considers it challenging to anonymise personal data with certainty. It is therefore important to undertake thorough risk assessments before processing anonymous data, and to employ sound anonymisation techniques.

According to the PHS’s plans, chat logs will be anonymised before they are processed by CAIR at UiA. On that basis, personal data will only be processed within the PrevBOT project from the time the chat logs are made available until anonymisation takes place.

If no personal data is processed during the development phase of PrevBOT, the data protection regulations will not apply. Information from criminal cases may as such be processed in the research project without being restricted by data protection regulations, provided that the resultant data is considered anonymous in accordance with the General Data Protection Regulation.

The way forward – the following is a secondary presentation

The Data Protection Authority acknowledges that there is a risk that personal data may be processed in the PrevBOT project. In any case, the Data Protection Authority assumes that the PrevBOT project will process personal data in the aforementioned processing activities in order to proceed with the legal analysis. As such, a large part of the following will be a secondary discussion and is therefore intended as a guide.

When personal data is processed for research purposes, a number of conditions must be met. The data controller must consider a number of different factors in order to ascertain whether the activity concerns the processing of personal data for research purposes. It is important to note that even if the processing of personal data is found to be for research purposes, the requirements of the GDPR must be upheld. The Data Protection Authority has a general concern that an overly broad interpretation of the research concept could lead to misuse in this particular situation.

Legal: How can personal data for research in the PrevBOT project be processed lawfully?

In the following, we will explore the legal scope for processing personal data for research purposes.

What is research?

Recital 159 of the GDPR sets out that ‘Where personal data are processed for scientific research purposes, this Regulation should also apply to that processing’. Since PrevBOT is a research project, personal data is processed for research purposes. It must first be considered whether the development of PrevBOT falls under the definition of research in the GDPR. This is because the processing of personal data for research purposes is in a special position in the GDPR, where there are certain exceptions to the general rules.

There is no universal and accepted definition of the term ‘scientific research’. Nor is the term defined in the GDPR or the Personal Data Act.

The OECD’s definition of research

The OECD has established the following international guidelines for the delimitation and classification of research:

Research and experimental development (R&D) comprise creative and systematic work undertaken in order to increase the stock of knowledge – including knowledge of humankind, culture and society – and to devise new applications of available knowledge.

Retrieved from the Frascati Manual 2015 (p. 28)

According to Recital 159 of the GDPR, the term ‘scientific research’ is to be interpreted broadly and may include, for example, technological development and demonstration, basic research and applied research. It should be mentioned that the European Data Protection Board (EDPB) is currently working on guidelines for the processing of personal data for scientific purposes, which is expected to provide an overview and an interpretation of the various provisions governing research in the GDPR.

PrevBOT is organised as a research project

The PrevBOT project is led by the Norwegian Police University College (PHS). In addition to being the central educational institution for police education, the PHS also offers continuing and further education, a master’s degree programme and research in police science. As mentioned in the introduction, the PrevBOT project is organised as a research project. The scope of the project is set out in the project memo the PHS submitted to the Ministry of Justice and Public Security.

The Data Protection Authority has not made an independent assessment of whether the processing activities in the PrevBOT project constitute processing of personal data for research purposes, but assumes that the actual development of PrevBOT by the PHS is to be considered ‘scientific research’ pursuant to the GDPR.

Difference between development (research) and use of PrevBOT

The Data Protection Authority emphasises that there is a difference between the processing of personal data for research into the development of artificial intelligence and the use of PrevBOT. The actual use of PrevBOT clearly falls outside the research definition. In the view of the Data Protection Authority, the continued learning of an algorithm in widespread use cannot be considered scientific research. It is therefore important that those responsible for the PrevBOT project are aware of the distinction between research and use, so that they do not rely on the research track in the data protection regulations with respect to the use and continued learning of the solution.

The data controller in a research project

The data controller is responsible for compliance with the data protection principles and regulations under the GDPR, and determines the purpose of the processing of personal data and the means to be used, cf. the GDPR Article 4(7). There may be one or more data controllers (either joint or separate) in a research project, and the data controller must, among other things, ensure that there is a basis for processing

The PrevBOT project consists of several work packages involving different actors, and it is important that the data controller(s) are identified. In the Data Protection Authority’s view, it would seem that the PHS can be deemed a data controller in the PrevBOT project. If there are two or more data controllers, they may be joint data controllers under the GDPR Article 26. If any of the actors are deemed to be data processors, a data processor agreement must be entered into with them.

About the disclosure of personal data from the police

The personal data (chat logs) in PrevBOT were originally collected by the police for the purpose of investigation. When the police make personal data available for processing in the PrevBOT project, the police must have a legal basis for such disclosure. We have not considered disclosure of this kind, as this processing activity falls outside the scope of the PrevBOT project.

We emphasise, nonetheless, that the assumptions mentioned in Chapter 5 (Legal: General information about the police’s processing of personal data for the development of AI) that apply to ‘the entity that discloses/makes personal data available’ must be in place. The Data Protection Authority will also make some comments regarding the compatibility assessment in the context of further processing for use in research.

Compatible further processing

The compatibility assessment applies to information that the data controller already has. When the chat logs are made available to the PrevBOT project for use in research, this is considered the processing of personal data for ‘secondary purposes’. By ‘secondary purposes’ is meant the types of purposes specified in the GDPR, including ‘purposes related to scientific research’, cf. Article 5 (1)(b). Research for a secondary purpose is considered a compatible purpose, cf. the GDPR Article 6(4), provided that the necessary safeguards in accordance with the GDPR Article 89 are in place. This means that the personal data can be processed for this purpose, provided that such safeguards exist.

In the view of the Data Protection Authority, the processing activities set out above will have the same purposes, i.e. purposes related to scientific research. However, it follows from the preparatory work to the Personal Data Act that if further processing for research purposes entails data being disclosed to other data controllers, the data controller receiving the personal data must be able to demonstrate an independent basis for processing.

Legal basis for research

All use of personal data must have a basis for processing for it to be lawful. The GDPR does not establish any specific bases for processing for research purposes, which means that it is the general processing bases in the GDPR Article 6 that are applicable. Several bases for processing may be relevant for research purposes, but for the PrevBOT project, the Data Protection Authority considers Article 6 (1)(e) to be of particular relevance. The provision covers, among other things, the processing of personal data that is ‘necessary for the performance of a task carried out in the public interest’.

The GDPR does not provide any guidelines on what is to be deemed a task in the public interest. The Norwegian commentary to the GDPR assumes ‘that it is not up to each Member State to define what constitutes a task of public interest, but that a common European standard will be developed over time in this area’.

The necessity requirement sets the framework for what constitutes lawful processing of personal data. It is the specific processing of personal data that must be necessary for the performance of a task in the public interest. Relevant aspects to assess include whether the task in the public interest can be fulfilled through less intrusive processing, and whether the processing goes further than the task requires. The European Data Protection Board (EDPB) has stated the following in its guidelines:

Assessing what is ‘necessary’ involves a combined, fact-based assessment of the processing ‘for the objective pursued and of whether it is less intrusive compared to other options for achieving the same goal’. If there are realistic, less intrusive alternatives, the processing is not ‘necessary’.

The Data Protection Authority does not consider an in-depth assessment necessary here, as it seems fairly evident that the processing of personal data within the framework of the PrevBOT project meets the conditions set out in Article 6(1)(e).

Processing of special categories of personal data

Article 9(1) sets out a general prohibition on processing special categories of personal data. The processing of special categories of personal data requires a basis for processing pursuant to the GDPR Article 6. One of the exceptions listed in Article 9(2) must also apply.

Information about a person’s sexual relations is considered a special category of personal data pursuant to Article 9(1). The Data Protection Authority assumes that when personal data is to be processed in the PrevBOT project, it will often fall under the special categories in Article 9. If special categories of personal data are processed, it will be relevant to consider the GDPR Article 9(2)(j).

This provision stipulates several conditions:

  1. The processing is necessary for the purposes of scientific research in accordance with the GDPR Article 89(1).
  2. Supplementary legal bases are required (see more on this below).
  3. The processing must also be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject. Whether these conditions are met is based on a specific assessment.

Supplementary legal basis

Processing personal data pursuant to the GDPR  Article 6(1)(e) and Article 9(2)(j) requires a supplementary legal basis, which means that the basis for the processing in the provisions is ‘determined’ by Union or Member State law. This also follows from Article 6(3) concerning detailed requirements for the wording of legislation, for example, that the purpose should be stated, the type of information registered, affected data subjects and rules for further processing etc. It must be considered specifically how precise the supplementary legal basis should be.

When special categories of personal data are processed, the principle of legality will play a more significant role, where, among other things, gradually stricter requirements will apply to the formulation of the legal authority in the supplementary legal basis.

In the context of the PrevBOT project, three potential supplementary legal bases will be presented. 

1. Sections 8 and 9 of the Personal Data Act

In Norwegian law, there is little specific regulation of research, except in the area of health research, and no specific legal authority for research is set out in the Police Databases Act. We must therefore take into account the general supplementary legal bases for research set out in Sections 8 and 9 of the Personal Data Act. The purpose of Sections 8 and 9 is to provide a supplementary legal basis for research where no other supplementary legal basis exists in special legislation.

The provision in Section 8 sets out several conditions. Firstly, the research must be for scientific research purposes. Reference is made to the discussion above, and the Data Protection Authority considers the processing of personal data in PrevBOT to meet this condition. 

Furthermore, the processing must be necessary. In the legal commentary on the legal resources website Juridika, the necessity condition here is assumed to have no independent meaning, besides the necessity condition in the GDPR Article 6(1)(e). As long as the condition in Article 6 is met, it will also be met in accordance with Section 8 of the Personal Data Act.

In addition, there is a requirement that the processing be subject to the necessary safeguards in accordance with the GDPR Article 89(1). What constitutes ‘necessary safeguards’ is not stated in the GDPR, but the provision sets out examples of measures that can be taken to ensure the rights and freedoms of the data subject. Among other things, technical and organisational measures must be implemented to ensure compliance with the principle of data minimisation. Such measures may include pseudonymisationencryption, strict access management, dates of erasure etc.

Several such measures have already been implemented in the PrevBOT project. For example, the Police IT Unit (PIT) restricts access to chat logs made available by local police districts, and personal data must be removed from chat logs before they are made available to CAIR/UiA. However, the Data Protection Authority encourages the data controller to consider whether other measures may also be relevant, relating to the various processing activities in the project.

If special categories of personal data are processed in the PrevBOT project, the stricter conditions in Section 9 must be met. Pursuant to Section 9, there is also a requirement that ‘the interest of society in processing the data clearly outweighs the disadvantages for the individual’.

It can be noted that the wording ‘clearly outweighs’ indicates that the threshold is high, and there must be a clear preponderance of interest. The wording here differs slightly from the wording in Article 9(2)(j), which states that it must be ‘proportionate to’. The question is therefore whether the public interest in personal data being processed in the PrevBOT project clearly outweighs the disadvantages for the data subjects whose personal data will be processed in the research (here: the aggrieved, the perpetrator and any third parties mentioned in the chat logs). 

Digital abuse of children is clearly a serious criminal act, which many have designated a public health problem. This demonstrates that society has a clear interest in conducting research on means of preventing such crime, such as PrevBOT. At the same time, the specific disadvantages for the data subjects must be thoroughly assessed, especially in relation to the safeguards implemented to limit privacy disadvantages for the data subject.

For the processing of personal data pursuant to Section 9 of the Personal Data Act, the PHS must first consult the data protection officer or other person who meets the requirements of the General Data Protection Regulation Article 37(5)(6) and Article 38(3) first paragraph and second sentence, cf. Section 9 second paragraph. Such consultations must consider whether the processing will meet the requirements of the GDPR and other provisions laid down in or pursuant to law. However, the duty of consultation does not apply if a data protection impact assessment has been carried out pursuant to the GDPR Article 35. The Data Protection Authority emphasises here that all data protection impact assessments (DPIAs) must be submitted to the Data Protection Officer, so that an assessment is always available.

On a general basis, the Data Protection Authority considers the provisions in Sections 8 and 9 to be somewhat unclear compared to the requirements made of how precise the supplementary legal basis should be. A weakness in Section 9 is that it is up to the Data Protection Officer, or similar, to carry out an internal assessment to clarify whether research is lawful. The Data Protection Authority therefore believes that separate research bases should be specified in special legislation that clearly sets out the framework for the research.

In summary, the processing of personal data in the PrevBOT project may meet the requirement for a supplementary legal basis, as long as the conditions in Section 8 (and Section 9 if special categories are processed) of the Personal Data Act, cf. Article 6(1)(e) (and Article 9(1)(j) if special categories are processed) are met.

2. Decisions under the Police Databases Act

There is no specific legal authority in the Police Databases Act for conducting research based on criminal case data. The right to waive the duty of secrecy in Section 33 of the Police Databases Act for information utilised for research does not in itself provide a legal basis for processing the data in such research. It follows from Section 33(2) that the decision-making authority for waiving the duty of secrecy for criminal cases is assigned to the Director of Public Prosecution.

In the area of health research, the Ministry has assumed in the preparatory work to the Personal Data Act that statutory decisions on exceptions to or exemptions from the duty of secrecy may provide supplementary legal grounds pursuant to Article 6(3)

Since the Police Databases Act has a similar provision on exemption from the duty of secrecy for research, it can be questioned whether a decision pursuant to Section 33 of the Police Databases Act could constitute a supplementary legal basis pursuant to the GDPR Article 6(3). This is based on a specific assessment of whether the basis for the decision is sufficiently clear in conjunction with the processing that is carried out. The more intrusive the processing, the clearer the supplementary legal basis must be.

In a letter of 27 July 2021, the Director of Public Prosecutions agreed to a special investigator in Trøndelag Police District being granted access to chat logs in criminal cases concerning sexual abuse of children, cf. the Police Databases Act Section 33. The exemption from the duty of secrecy applies to the Nettprat project, which preceded the PrevBOT project, and the Director of Public Prosecutions has subsequently consented to transferring relevant chat logs to PIT in connection with the PrevBOT project.

On the one hand, it may be argued that when an external public body, such as the Director of Public Prosecutions, decides that the duty of secrecy can be waived for research purposes, this will provide some safeguards for the processing of personal data. For example, the Director of Public Prosecutions may require that decisions set out certain measures to ensure the rights and freedoms of the data subject in the processing, following an assessment and weighing of the benefits of the processing and the consequences for the data subjects. It also follows from Section 33 of the Police Databases Act that the provision is limited to cases relating to research. The provision on which the decision is based therefore defines a group of purposes that the data can be used for (research).

On the other hand, the statements in the preparatory work to the Personal Data Act argue that the legislator’s assessment is specific to the field of health research. The Data Protection Authority is not aware of the legislator expressing this opinion with respect to Section 33 of the Police Databases Act, although the Ministry in the aforementioned preparatory works recognises that ‘processing may also take place on other grounds, such as exceptions or exemption decisions under other provisions’. Whether this statement refers only to exceptions/decisions in the health field, or whether it also has transfer value to other areas, is uncertain.

Another point is that health research is subject to relatively strict legislation through the Health Research Act. The same is not the case for other research, where it is often only the general data protection rules, and research ethics in general, that set the framework for the processing of personal data.

Thus, there are several arguments for and against decisions pursuant to Section 33 of the Police Databases Act constituting a supplementary legal basis pursuant to the GDPR Article 6(3) and Article 9(2)(j).

3. New legal authority

It may be questioned whether a possible extension of the scope of the Police Databases Act to include the processing of information for the development of artificial intelligence for police purposes will mean that such processing will then only be regulated by the Police Databases Act. The Data Protection Authority is in doubt about whether any regulatory or legislative changes could lead to the processing falling outside the scope of the GDPR. Such a change will probably not, in the view of the Data Protection Authority, affect the exemption in the GDPR Article 2(2)(d).

However, a statutory provision/regulation in special legislation may provide a supplementary basis under the GDPR. This must be taken into account when formulating the legal authority to ensure that the requirements of Article 6(2) and (3) are met.

Processing of personal data relating to criminal convictions and offences

Information about criminal convictions and offences is not regarded as special categories of personal data, but the processing of such data is specifically referred to in Section 11 of the Personal Data Act and Article 10 of the GDPR. It is natural to interpret ‘criminal convictions and offences’ under these provisions to include information associated with a specific judgment, but also information about criminal acts where a judgment has not been reached at the time of processing.

The processing of such data is subject to certain restrictions, which follow from the GDPR Article 10. In particular, such data may only be processed under supervision of a public authority, or, if the processing is carried out by private persons, provided that there is a basis for processing pursuant to the GDPR Article 6 and a supplementary legal basis.

It is debatable whether the PHS as a research institution is subject to supervision of a public authority, but there may be additional legal bases nevertheless, cf. Section 11 of the Personal Data Act. The provision allows, albeit in a rather complicated manner, processing for research without consent if the public interest in the processing being carried out clearly exceeds the disadvantages for the individual, referencing Section 9 of the Personal Data Act.

For processing pursuant to Section 11 of the Personal Data Act, the PHS must first consult the data protection officer to determine whether the processing will meet the requirements of the GDPR and other provisions set out in or pursuant to the law. However, the duty of consultation does not apply if a data protection impact assessment has been carried out pursuant to the GDPR Article 35.

Summary

In summary, several GDPR provisions allow, subject to certain conditions, the processing of personal data for the purpose of scientific research, including special categories and personal data relating to criminal convictions and offences.

What rights do data subjects have when their personal data is used for research?

The data subject has several rights under the data protection regulations when their personal data is processed. However, the GDPR contains some specific provisions that apply when personal data is processed for purposes related to scientific research. These specific provisions are set out in Section 17 of the Personal Data Act and restrict the general rights of the data subject. The restrictions are only applicable provided there are sufficient safeguards pursuant to the GDPR Article 89(1).

In the following, we will address some key rights regarding the disclosure and processing of personal data for research:

  • The right to information gives the data subject, among other things, the right to information about who receives personal data, who the data controller is, the purpose and the legal basis for the processing, unless this proves impossible or would require a disproportionate effort, cf. the GDPR Article 14(1), (2) and (5)(b). Considerations for the impact of the intervention for the individual and the research project must be weighed against each other here. There will be strong incentives in favour of the duty to provide information due to the nature of the information and the fact that neither the aggrieved, the perpetrator nor a third person has provided the information voluntarily. If providing information is considered impossible or would require a disproportionate effort, the controller must take appropriate measures to protect the data subject's rights and freedoms and legitimate interests, including making the information publicly available.
  • The right of access pursuant to Article 15 of the GDPR does not apply to such processing if providing access would require a disproportionate effort, or the right of access is likely to render it impossible or seriously obstruct such objectives, cf. Section 17 first paragraph of the Personal Data Act. This exception does not apply if the processing has legal effects or direct practical effects for the data subject. If a right of access exists for the specific processing, it is important to include this in the design of the algorithm/AI model.
  • For the right to erasure and restriction of processing under the GDPR Articles 16 and 18, the research design must take two factors into account: the right to erasure of the training data and the right to erase data from the trained model if it contains personal data. In research cases, the right to correction/restriction will not apply under the Personal Data Act Section 17 second paragraph if the rights are likely to render it impossible or seriously obstruct the objectives of the processing being achieved. This is based on a specific assessment.
  • The right to object to the processing does not apply if the personal data for scientific purposes is processed in accordance with the GDPR Article 6 (1)(e), cf. Article 21(6).

Technology: The Tsetlin Machine

The Norwegian Police University College envisages building PrevBOT based on the Tsetlin Machine (TM). The strength of a TM is that it has better explainability than neural networks. In a project like PrevBOT, where people are to be categorised as potential abusers based on (in most cases) open, lawful, online communication, it will be important to be able to understand why the tool reaches its conclusions.

The Tsetlin Machine is a machine learning algorithm designed by the Norwegian researcher Ole-Christoffer Granmo in 2018. Granmo is professor of computer science at UiA, and has further developed the Tsetlin Machine with colleagues. Given it is a relatively new machine learning method, research is being carried out in this area with the aim of exploring and optimising its application and performance capabilities. Like all machine learning models, the Tsetlin Machine depends on the quality and representativeness of the training data.

Tsetlin Machines

The Tsetlin Machine is not a type of neural network. It is an algorithm based on reinforcement learning and propositional logic. The algorithm is suitable for tasks in classification and decision-making where both interpretability and accuracy are important. Propositional logic is an algebraic method that classifies sentences or statements as true or false, using logical operations such as ‘and, or, not, if, then’.

The Tsetlin Machine learns by using reinforcement learning and learning automata. Reinforcement learning means that the model is rewarded or punished based on the results of actions taken, while learning automata make decisions based on previous experiences, and these experiences serve as guidelines for current actions.

The Tsetlin Machine uses clauses to understand how individual clauses affect the decision-making process. This approach makes the Tsetlin Machine suitable for applications in which interpretability plays an important role.

Tsetlin Machines versus neural networks

Neural networks (deep learning models) require large datasets and large computational resources for training. The Tsetlin Machine needs fewer computational resources compared with complex neural networks. Research from 2020 shows that the Tsetlin Machine is more energy efficient, using 5.8 times less energy than neural networks.

Neural networks are suitable for tasks such as prediction and image and speech recognition, identifying complex patterns and relationships in data. The Tsetlin Machine is suitable for certain types of classification problems where interpretability is important. The Tsetlin Machine uses propositional logic in decision-making. It consists of a collection of Tsetlin automata that represent logical rules. Each Tsetlin automaton has a weighted decision that is adjusted based on the learning process. The weighting determines the extent to which a specific characteristic or pattern affects the decision. This provides a higher degree of understanding because the use of logical rules enables decisions to be traced back to the individual clauses.

Neural networks are inspired by the human brain and consist of many layers of artificial neurons that are connected through many nodes and weights. Often complex and non-transparent, they are considered ‘black boxes’ due to their complexity and limited understanding of how they make decisions.

Neural networks may also inadvertently amplify and maintain biases in the training data. If the training data contains biased or discriminatory information, the model can learn and reproduce such biases in the output it generates. This can lead to unintended consequences and reinforce prejudice.

The transparency of the Tsetlin Machine means it can be examined for bias, which may be removed from the model by modifying the propositional logic, instead of indirect changes to the data or via post-training. This indicates that it is easier to correct.

The Tsetlin Machine learns to associate words with concepts and uses words in logical form to understand the concept. An important component of this process is the use of conjunctive clauses, which are phrases or expressions that combine two or more conditions present or absent in the input data in order to be classified as true or false.

For example: ‘I will only go to the beach if it’s sunny and if I get time off work’. Here, ‘if it’s sunny’ and ‘if I get time off work’ represent conjunctive clauses that must be met at the same time in order for the person to make the decision to go to the beach. These clauses are used to identify patterns in the input data, by creating conditions that must be met at the same time. These clauses are then used to build up decision-making rules that form the basis for classification. The ability to handle complex conditions makes the Tsetlin Machine suitable for determining whether or not the input data belong to a specific class.

The workflow of the Tsetlin machine in the PrevBOT project

PrevBOT aims to develop a transparent language model that can classify the presence of grooming in a conversation. The first step is to give the algorithm general training in the Norwegian language. This gives the algorithm a solid understanding of the language, and reduces the impact of potentially limiting datasets when trained in grooming language. If the training is limited to this narrow topic, there is a risk of too few examples being generated. Another important reason is that a general understanding of a language lays the foundation for developing specialised skills in a more comprehensive manner. In order to train the algorithm to master the Norwegian language in general, the use of large Norwegian datasets is useful (available from the Norwegian Language Bank at the National Library). This can also be compared to pre-training in large language models.

Norwegian grooming

Experience from Norwegian criminal cases shows that the abuser and the child communicate in Norwegian. The technology must therefore be based on Norwegian text data. Ensuring there is a sufficient amount of Norwegian text data is therefore a prerequisite for developing the AI model. At the time of the writing for the sandbox project, it is uncertain whether this prerequisite is met, but it may be realised over time, if the method is found to be appropriate.

Once the algorithm’s language skills have reached a sufficient level, the second step is to train it to become a specialist in grooming language classification. After acquiring basic Norwegian skills, the algorithm can then learn word context and the relevance of each word within the grooming language. This will enable the algorithm to master the language at a general level before starting the specific task of grooming detection.

The text in chat logs from criminal cases plays an important role. The examples must be very specific and precise, and should be labelled by an experienced domain expert in the field of grooming. Based on the general Norwegian language training and knowledge of grooming language classification, the algorithm will be able to recognise grooming conversations in Norwegian. A more detailed description of steps one and two follows below.

faktaboks_fra logg til algoritme_eng.jpg

Step one: train the algorithm in Norwegian

First, they must develop Tsetlin machine-based autoencoders that autonomously perform word embedding in large Norwegian datasets. The training consists of producing representations for words, which is done based on large datasets.

The Tsetlin Machine uses principles from propositional logic and logical clauses to make decisions. The figure below shows an example of the results (arrows) of propositional logic embedding using the Tsetlin Machine in a small English dataset. The Tsetlin Machine uses these clauses to build decision-making rules that form the basis for classification.

As illustrated, the results show that the words are correlated with other words through clauses. If we take the word ‘heart’ as an example, we see that it is related to ‘woman’ and ‘love’, and also associated with ‘went’ and ‘hospital’. This example shows that the word has different meanings depending on the context. It indicates that the Tsetlin Machine embedding has the capacity to learn and establish sensible correlations between words. These properties lay the foundation for better explainability and perhaps also manual adjustment.

faktaboks om ordklassifisering_eng.jpg

Step two: classify grooming languages

The training data must contain examples of text labelled either grooming or non-grooming. A selection of relevant rules, whether specific words, phrases or text structure, is essential to provide the algorithm with the necessary information. The algorithm identifies grooming conversations by analysing the language and recognising patterns or indicators associated with the risk of grooming. Positive examples (grooming) and negative examples (non-grooming) are used to adjust the weighting of the clauses.

The examples should in theory be an integral part of the algorithm rules and be used during the training to help the algorithm understand what characterises grooming conversations. The training data containing examples/texts labelled grooming or non-grooming is thus used as part of the training process. They are used to develop and adjust the rules that the algorithm uses to identify grooming conversations. As the algorithm trains, it analyses the labelled examples to learn patterns and indicators related to grooming. By comparing the properties of positive (grooming) and negative (non-grooming) examples, the algorithm gradually adjusts the weighting of the rules or clauses it uses for classification. This may involve assigning more weight to words or phrase structures that are associated with grooming, and less weight to those that are not. The word embedding from step one can be used for classification.

The combination of guided learning and reinforcement learning involves repeated adjustment of the conjunctive clauses. The adjustment is normally automatic and is based on previous decisions. During training, the algorithm learns to adjust the weights to recognise patterns and make correct classifications. A fully trained model is not only expected to be able to classify text as a potential grooming conversation, but also to interpret it due to the transparent nature of the algorithm. The interpretation is based on clauses in a trained Tsetlin Machine model. The clauses consist of logical rules that effectively describe whether the language indicates grooming or not. The rules for interpreting a given input sentence can be obtained from the clauses that have been activated. These rules can then be used to explain the algorithm’s decision.

Simplified overview

  1. Data collection

    Collect Norwegian text from open Norwegian sources (National Library) and chat logs from criminal cases (grooming conversations between potential victims and potential abusers) to form datasets. The datasets should contain varied examples with both positive examples (grooming conversations) and negative examples (non-grooming).
  2. Data preparation

    Structuring the data to fit the Tsetlin Machine, e.g. representing text data by means of vector representations (vectorisation of words). Bag-of-word (BOW) representations (binarisation of words) can also be used.
  3. Goal

    Identify relevant properties in text that distinguish between grooming and non-grooming chats, such as the use of specific words, contextual nuances/clues, sentence structures or tones of voice typical of grooming behaviour.
  4. Training

    Structured data are used for training. During training, the Tsetlin automata adjust their internal parameters to recognise patterns that are characteristic of grooming conversations. This involves adapting logical rules that take into account word choice, context and other relevant factors, specific words, expressions or patterns associated with grooming.
  5. Decision-making

    After training, the algorithm should be able to analyse and make decisions about whether text data contains indications of grooming.
  6. Feedback and fine-tuning

    The results are assessed to reduce false positives and negatives. The model is periodically adjusted based on feedback to improve accuracy over time. This may include new data, fine-tuning rules or introducing new rules to deal with changing patterns.
  7. Implementation

    Real-time detection to report suspected grooming patterns. The Tsetlin Machine predicts the probability of an online chat containing elements of grooming.

Ethics: A responsible AI framework

Not everything that is allowed is a good idea. (Nor is it necessarily the case that everything that is a good idea is allowed.) Ethical reflection can help us see more clearly when such conflicts arise. The PHS wants PrevBOT to live up to the principles of ethical and responsible artificial intelligence, and we have tried to concretise how they can do that in the sandbox project.

We have focused on the research and development phase of the project for the ethical issues as well. Often, however, questions about what is ‘ethically right’ in the development phase will depend on what consequences and benefits we envisage during the use phase. In this chapter, we have therefore to a greater extent envisioned alternative ways in which the PrevBOT can operate, without this necessarily reflecting what the PHS has actually planned.

The goal

How can we measure whether the PHS and PrevBOT maintain the desired ethical level? What characterises the development process behind and a product with responsible artificial intelligence?

‘Responsible artificial intelligence’ is not a protected term that you can use to label your AI tool if it checks off all the items on a specific list of requirements. It is a designation for artificial intelligence that maintains a certain level of accountability when it comes to the consequences of the system – in terms of both development and use – for users and society.

Ethical, responsible or trustworthy AI?

  • ‘Ethical AI’ primarily refers to adjusting artificial intelligence systems in accordance with ethical principles and values. This could be ensuring that the system does not perpetuate prejudice or injustice, and that it makes a positive contribution to human rights and welfare.
  • ‘Responsible AI’ is about operationalising ethics into practical measures, and ensuring that conscious efforts are made when developing and using AI systems to avoid harm and misuse. A general definition of responsible AI is that AI technology is developed and used in a responsible, transparent and sustainable manner.
  • ‘Trustworthy AI’ is a common term in EU contexts and refers to AI systems being lawful, ethical and robust. It is not enough that the technology is in line with laws and regulations – it must also be developed and implemented in a way that earns the trust of users and society by being reliable, secure and transparent.

Although there is considerable overlap between these concepts, the differences often lie in the emphasis: ethical AI focuses on the moral aspects, responsible AI on accountability and the operationalisation of these ethics, and trustworthy AI on earning and maintaining public trust through compliance with legal, ethical and technical standards.

Several different agencies have drawn up principles and criteria for artificial intelligence. First and foremost, there is the 2019 Ethics guidelines for trustworthy AI , prepared by an expert group commissioned by the European Commission. The OECD has developed the OECD AI Principles, which encourage innovation and responsible growth of AI that respects human rights and democratic values. In 2022, UNESCO published its Recommendation on the Ethics of Artificial Intelligence. The consulting company PwC developed nine ethical AI principles on behalf of the World Economic Forum. Over time, academic institutions, think tanks and technology players such as Google and Microsoft have come up with different approaches to ethical, responsible and trustworthy AI. Several of these principles and guidelines are general and mostly aimed at political governance. Others are more concrete, and therefore useful for developers. One example is the thorough checklist contained in the guidelines published by the European Commission’s expert group.

There are also guidelines for responsible AI that apply to a specific domain, such as the health service and the financial sector. INTERPOL and the United Nations Interregional Crime and Justice Research Institute (UNICRI) have published the document Principles for Responsible AI Innovation specifically aimed at development in law enforcement agencies, which are relevant to the PrevBOT project. 

faktaboks om Interpol-tabell_eng.jpg

The Institute of Electrical and Electronics Engineers (IEEE) has also developed standards for the responsible and ethical development of artificial intelligence. These include standards for specific challenges, such as IEEE P7001, which focuses on transparency in autonomous systems, IEEE P7002, which addresses data protection and privacy, and IEEE P7003, which addresses algorithmic bias. They have also prepared the more comprehensive and comprehensive guidelines Ethically Aligned Design (EAD), which highlights key principles to ensure that the development of artificial intelligence and autonomous systems is in line with ethical norms and values.

Ethics in the national AI strategy

In the sandbox project, we choose to look at the National Strategy for Artificial Intelligence, which defines seven ethical principles for artificial intelligence based on the guidelines drawn up by the European Commission’s expert group. As such, the PrevBOT project should strive to comply with the following:

  1. AI-based solutions must respect human autonomy and control

    The development and use of artificial intelligence must foster a democratic and fair society by strengthening and promoting the fundamental freedoms and rights of the individual. Individuals must have the right not to be subject to automated processing when the decision made by the system significantly affects them. Individuals must be included in decision-making processes to assure quality and give feedback at all stages in the process ('human-in-the-loop').
  1. AI-based systems must be safe and technically robust

    AI must be built on technically robust systems that prevent harm and ensure that the systems behave as intended. The risk of unintentional and unexpected harm must be minimised. Technical robustness is also important for a system's accuracy, reliability and reproducibility.
  1. AI must take privacy and data protection into account

    Artificial intelligence built on personal data or on data that affects humans must respect the data protection regulations and the data protection principles in the General Data Protection Regulation.
  1. AI-based systems must be transparent

    Decisions made by systems built on artificial intelligence must be traceable, explainable and transparent. This means that individuals or legal persons must have an opportunity to gain insight into how a decision that affects them was made. Traceability facilitates auditability as well as explainability. Transparency is achieved by, among other things, informing the data subject of the processing. Transparency is also about computer systems not pretending to be human beings; human beings must have the right to know if they are interacting with an AI system.
  1. AI systems must facilitate inclusion, diversity and equal treatment

    When developing and using AI, it is especially important to ensure that AI contribute to inclusion and equality, and that discrimination be avoided. Datasets that are used to train AI systems can contain historical bias, be incomplete or incorrect. Identifiable and discriminatory bias should, if possible, be removed in the collection phase. Bias can be counteracted by putting in place oversight processes to analyse and correct the system’s decisions in light of the purpose.
  1. AI must benefit society and the environment

    Artificial intelligence must be developed with consideration for society and the environment, and must have no adverse effects on institutions, democracy or society at large.
  1. Accountability

    The requirement of accountability complements the other requirements, and entails the introduction of mechanisms to ensure accountability for solutions built on AI and for their outcomes, both before and after the solutions are implemented. All AI systems must be auditable.

Artificial intelligence and research ethics

The national strategy also points out that artificial intelligence research must be conducted in accordance with recognised standards for research ethics. In addition, it refers to the National Committee for Research Ethics in Science and Technology’s Statement on research ethics in artificial intelligence, in which they launch nine principles for AI research in three areas:

  1. Responsibility for the development and use of autonomous systems:
    AI research must safeguard human dignity, assign responsibility, be possible to inspect (inspectability) and contribute to informed debate (dissemination of research).
  2. Societal consequences and the social responsibility of research:
    AI research must recognise uncertainty and ensure broad involvement.
  3. Big data:
    AI research must ensure data protection and consideration of individuals, ensure verifiability and quality, and enable fair access to data.

We demonstrate how ethical issues can be assessed against the relevant principles from the national strategy towards the end of this chapter. But first, we will try to identify, as best we can, ethical issues inherent in the PrevBOT project and look at which tools and clarifications can lay the foundation for good ethical assessments.

Straying from the ethical path

Before we attempt to concretise the implications of the ethical frameworks for the PrevBOT project, we will return to the fundamental question: Is it actually right for the Norwegian Police University College (or other institutions associated with the police authorities) to conduct research on new technology when it is fairly certain even beforehand that it will have adverse effects, but the overall scope of such effects is difficult to estimate? Or could it possibly be the first step toward straying from the ethical path?

To assess this, the sandbox project did what we might call a first-step analysis, inspired by the ‘just war’ tradition (Bellaby, 2016; Diderichsen, 2011; Kleinig, 2009; Syse, 2003), a line of thinking that is currently at the core of intelligence, and in fact the use of force in general.

We did not aim to conduct a complete ethical analysis, but the analysis enables us to shed light on key issues that will hopefully provide a few guidelines for PrevBOT research in charting the course ahead. We also hope it will be useful for others to read the specific examples of ethical discussions in this report.

Slave after the first step

The philosopher Hans Jonas, known for his work on the ethical implications of modern technology and science, described how we are free to take the first step, but slaves to the steps that follow (Jonas, 1983). Although we have the freedom to initiate actions, the ensuing consequences of these actions bind us, limiting our future freedom. This underlines the importance of responsible decision-making, particularly in light of irreversible technological interventions in nature and human lives.

faktaboks utviklingsløp_eng.jpg

For PrevBOT, the distance from the idea and needs stage, via the prototype, design and development phases in the process outlined above, will be relatively short. This is because the functions the PHS wants to include in the bot have generally been demonstrated in other research. The PrevBOT project is therefore about getting the different parts to work together as a whole. To find out if it works, it must be tested, initially within a secure setting such as simulation. Another step has been added, however, as shown in the illustration above. Once the bot is developed and ready for testing in the intended environment, it can be difficult not to use it – in one way or another – if society or individual cases ‘demand’ it. If not by the Norwegian police, then by a commercial or other actor.

It is also easy to envisage another potential ‘demand’: A requirement that PrevBOT also stores data about the individuals who are flagged so that they can be prosecuted. If this is the case, is it ethically right for the police to not investigate when they are handed potential evidence and abusers on a plate? Perhaps not. But a PrevBOT that can be used for investigative purposes is probably more intrusive – and quite different – from a preventive bot. Flagging will then have greater consequences for the individuals, as the material will probably have to be stored for longer and shared with other parts of the police and prosecution authorities. It may therefore be wise to design the bot in such a way that it does not put the police in this ethical dilemma at a later date.

Research is research. Each step of the development process could present both known and unknown opportunities, and known and unknown consequences. The first step could as such lead us to stray off the path, where we wander off more or less consciously and end up somewhere we did not initially want to be. That is not to say that you should never take the first step. But it is important to be aware – already at the idea and needs stage – of the potential consequences of a final product.

‘Just war’

In his consideration of whether to take the first step in the PrevBOT research project, ethics professor Jens Erik Paulsen from the PHS was inspired by the ‘just war’ tradition, and highlighted seven elements that are relevant to look at:

  1. Legitimate authority
  2. Just cause
  3. Right intention
  4. Proportionality
  5. Probability of success
  6. Last resort
  7. Consideration for innocent third parties

Ethics: First step analysis

Based on the seven elements in the previous section, the sandbox project will assess whether it is ethically right for the PrevBOT project to take the first step into the research.

Legitimate authority

Is it legitimate for the PHS to develop technologies such as PrevBOT? Is it even legitimate for law enforcement authorities to be drivers in the development of new technology?

The police have been criticised for not keeping up with the digital transformation. In autumn 2023, the Office of the Auditor General levied considerable criticism in a report stating that the police have obsolete IT systems, that there is internal dissatisfaction with digital services and tools, and that the Ministry of Justice and Public Security and the National Police Directorate had inadequate knowledge of technology and how it can be used to develop the police and prosecution services of the future.

Long-term omissions do not justify unrestricted development in the field, however. A lack of experience and knowledge may indicate that the police should now be particularly mindful when attempting to develop (or conduct research on) new and advanced technology. At the same time, the Office of the Auditor General states that the police’s failure to prioritise digitalisation and technology has reduced security and weakened crime prevention efforts. Doing nothing could therefore be an equally problematic option, ethically speaking.

Perhaps there is a point of ’balanced advancement’ (see figure below) to take advantage of the opportunities new technology brings about?

faktaboks etisk-iver-kurve_ENG.jpg

For the PrevBOT project specifically, we are talking about serious crime and it is reasonable that the police attempt to combat it. Based on the number of reported crimes and the assumed hidden statistics, the problem is of such a magnitude and nature that the police do not consider it possible to ever get to the bottom of it. Methods to prevent or in some way avert the problem are therefore necessary. Crime prevention is in any case also the police’s main strategy.

That is not to say that such a system should only be used by the police. It is also conceivable that all or parts of a fully developed PrevBOT technology could be usefully employed by other actors, that automated alerts could be sent and/or calls intercepted, without the police being involved. In other words that internet actors, both commercial and public, could use PrevBOT technology to moderate what is taking place on the platforms.

It would nevertheless be legitimate for the PHS to be responsible for the development of such a tool. Assuming transparency of the results, there is reason to argue that it is precisely an institution linked to the police authorities that should be responsible for this research.

Just cause

Is there just cause to develop such a system? The need for protection is clear. As pointed out in the first chapter, each abuser could have tens or hundreds of victims, and the consequences of sexual abuse are a public health problem. So there is clearly just cause to take action on the issue. But are there convincing reasons for doing so in line with how PrevBOT is envisioned, by intercepting private conversations (albeit on open forums)?

Does it violate the children’s autonomy if the police keep track of and intervene in conversations on suspicion of attempted grooming? Yes. It reduces the young people’s possibility and ability to assess and decide for themselves how to cope with the situation. Yet there may be just cause to do so nonetheless. After all, the situation concerns minors, who are also entitled to protection.

Online minors, the group PrevBOT is intended to protect, is by no means a homogeneous group. There is considerable variation in how well parents look out for and guide their children on netiquette. The age of those in need of protection also varies greatly. Many of the minors using platforms where grooming occurs are almost of age, while some are as young as 10 years old. There is considerable variation in sexual development, curiosity and experience. There is also some variation in knowledge and experience of dealing with attempts at manipulation. So in sum, how vulnerable they are varies to an extent. The most vulnerable may be characterised by poor follow-up at home, low digital literacy and a high degree of risk-seeking behaviour.

The UN Convention on the Rights of the Child states that children have the right to protection. Article 34 deals explicitly with the right of children to be protected from sexual exploitation, while Article 16 deals with the right to, and protection of, a private life. Article 12 deals with respect for the views of children and recognises the right of children to be active participants in decision-making processes that affect them. Children are therefore entitled to some kind of autonomy, but this freedom seems – both in words and practice – to be subordinate to the requirement for protection.

See also ‘Barnet – et menneske uten krav på fulle menneskerettigheter?’ (‘The Child – A Human Being with No Claim to Full Human Rights?’) by Paul M. Opdal (in Norwegian only)

To avoid PrevBOT being perceived as an arbitrary interference of privacy, it is important that the bot can provide real protection. It is not enough to point out the prevalence of sexual exploitation to be combated. Here, we must analyse the actual situation in which the bot (or the operator of the bot) is to intervene in the child’s online activity, and measure the degree of threat on the one side against the degree of vulnerability on the other. As mentioned, the degree of vulnerability will vary, but many young people will be highly vulnerable in the form of little experience of recognising attempts at manipulation, often in combination with sexual curiosity and/or insecurity. The threat, in terms of the risk of attempted grooming and the consequences of any abuse, is also great. Neither party in these conversations is particularly capable of long-term thinking (on consequences for others and consequences for themselves, respectively). The fact that an abuser can meet a victim in a chat room, which is more or less unregulated, is obviously a problem. A tool that intercepts such meetings would provide real protection.

Sexual abuse in general, and grooming in particular, is a problem of such magnitude and complexity that one measure alone is unlikely to overcome it. However, PrevBOT can undoubtedly serve as a useful tool, and the reasons for its development appear just.

Right intention

A third aspect of the first step analysis concerns the intention for the development of a PrevBOT. In practice, this depends on an assessment of the intention. Can we assume that the idea is based on, and that development will take place, with respect for the integrity/human dignity of the parties the technology targets? Are we sure that the intention of PrevBOT is to eliminate the crime, not the people and groups as such? The police and the PHS must consider and assess this themselves.

We might both suspect and understand that it is tempting for the police to also let the bot collect evidence to start an investigation based on flagged conversations. This is also featured in early sketches of the PrevBOT. Such a version may also be compatible with good intentions and an honest purpose to fight crime. It is admittedly more pertinent to address the potential intention problem by using a purely preventive PrevBOT, which is content to uncover and intercept.

Proportionality

The principle of proportionality means that the police should not ‘use any stronger means until milder means have been attempted in vain’ (the Police Instructions Section 3-1). The benefits of preventing abuse must also be weighed against the disadvantages of the development and use of a PrevBOT.

The sandbox project has not investigated whether there are other, milder means that the police should attempt before PrevBOT. Whether the bot in this context will be a particularly strong means depends on its design. An evidence-collecting PrevBOT is likely a more powerful tool than a purely preventive bot. This means that an evidence-collecting bot can only (possibly) be justified if a purely preventive bot has been attempted in vain.

We must also consider whether the use of a PrevBOT tool would be proportional to the problem to be combated. Could it be a case of using a sledgehammer to crack a nut? Sexual exploitation and abuse of children is not a ‘nut’. It is a serious crime and a public health problem. However, we must expect the tool to be accurate and that its use does not affect a vast number of people who are not at risk of becoming victims or abusers. Is there a need for such ‘mass surveillance’ to avert the crime you want to eliminate? Or in other words: can the PrevBOT be designed to minimise interfering with the privacy of ‘the masses’.

Can we ensure that flagged conversations, which according to the plan will be saved to continuously improve and linguistically update the model, are not stored for longer than strictly necessary? In situations where the police have intervened with a warning, they may be required to document the electronic traces that gave grounds for the intervention. However, storing unnecessarily large amounts of personal data for an unnecessarily long time is not good for privacy and data protection. Updates could, for example, take place relatively frequently, both to avoid an extensive inventory of logs, but also to ensure that PrevBOT performs optimally. The project can also assess whether it should only save the logs where a bot operator has intervened rather than all of the flagged logs. This would provide human quality assurance, which both reduces noise in the continued learning material and strengthens privacy.

It is important that proportionality is actively considered throughout the course of the research and development processes. As part of a first step analysis, we consider the project to be in line with the principle.

Probability of success

Taking a first step can be most clearly justified if there is a reasonable probability of success. Technologically speaking, it is so well proven that machines can identify specific conversational features and conduct sentiment analyses that we can safely assume there is a reasonable chance of succeeding in creating a bot that can detect and flag grooming attempts. However to ensure that it will not be a first step toward straying from the ethical path, we need to decide whether a technically functioning PrevBOT will have a reasonable probability of succeeding in preventing CSEA in practice.

Technically, it is important, for example, that the system is fast enough to be able to intercept before the conversation is moved to a closed forum. It is both about the bot’s ability to detect and flag suspicious conversations in time, but also whether the police’s online patrols have the capacity to follow up all conversations flagged by the PrevBOT and intervene quickly enough where needed. If the decision rests on the latter, the PrevBOT, which is to provide decision-making support for human online patrols, may quickly turn into a fully automated tool. This would in such case mean stricter legal and ethical requirements, where parts of the assessment will concern whether processing performed by the tool has legal implications for or, correspondingly, significantly affects the individual.

As such, it is possible to discuss whether an automatic pop-up warning is that intrusive? Maybe not in itself. However, many will perceive a warning sent by the police ‘labelling’ you as a potential abuser or potential victim to be intrusive, even though it has no legal consequences. So the wording of the warnings, and whether it is the potential victim, the potential abuser or both, who receive the warning, will require consideration.

PrevBOT’s chance of success is not just a technical or organisational issue, however. Other equally decisive factors that will determine whether it works as intended are: Will potential abusers be stopped by a pop-up warning on their screen? If the police are open about how the tool works, which is assumedly a prerequisite if PrevBOT is to be called responsible AI (cf. the principle of transparency), the well-informed will know that ignoring the warning will not affect the risk of being caught. Is it conceivable that the most dangerous abusers will be cold-blooded enough to defy the warnings and continue their chat?

How, then, will the potential victim experience a warning of a possible grooming attempt? As mentioned, the potential victims are by no means a homogenous group. The effect of a warning will probably depend on the situation. Attempts at grooming can occur on chat or gaming platforms intended for general socialising. These are places young people may perceive as being safe home turf, where they are less vigilant and may be caught off guard by flattery and grooming attempts. A warning in such case may be an effective wake-up call.

At the other extreme are minors who have already defied warnings and have ‘snuck’ into pornographic websites with an 18-year age limit. If someone attempts to groom you in such a context, when you are lying about your age and are looking to push (sexual) boundaries: Would you be bothered about a warning about attempted grooming?

Professor Elisabeth Staksrud from the Department of Media and Communication at the University of Oslo has been monitoring children’s internet use since the 1990s. Her research shows that those who are subject to sexual abuse after meeting people online usually have a hunch that they are meeting an adult who is looking for something of a sexual nature. So a warning about that particular danger will not bring anything new to the table. This does not necessarily mean that it will not have an effect though. A warning sent by the police could ensure that the gravity of the matter sinks in. However, we do not know whether such warnings will have an equally good effect on everyone. And perhaps least effect on the most vulnerable?

In addition, the potential abusers are unlikely to be a homogeneous group when it comes to age and ‘aggression’. Some are serial abusers with a conscious plan and method for luring their victims. Others may slip more subconsciously past their normal, moral scruples, and have ‘suddenly’ done things online that they would be unlikely to do in their ‘analogue’ lives. For these potential abusers, a police warning may be effective.

Further reading: ’Dette er de norske nettovergriperne’ (‘These are the Norwegian net abusers’ (aftenposten.no) – in Norwegian only)

PrevBOT is unlikely to be 100% effective in averting abuse in its operating platforms, even in cases where the bot has detected grooming and attempts have been made to intercept it. But it is reasonable to believe that it will be able to stop a fair amount. The uncertainty associated with the chat participants’ reactions to warnings and police interception indicates that more research on how the tool works in practice is essential as soon as it is taken into use.

In the sandbox project, we have also discussed the use of the words ‘victim’ and ‘abuser’. The people involved may not see themselves as potential victims and potential abusers, and these kinds of expressions may seem alienating. The wording used by the police in their interception attempts could therefore be decisive to whether PrevBOT has a reasonable chance of success.

One aspect is whether the chat participant responds to the warnings. Another is whether they believe they are genuine. How should young people, who learn how to be critical internet users at school, trust that it is actually the police intervening? What if it is the warning itself that they become critical towards? Hopefully, the police’s online patrols are experienced in handling this. It is in any case a possible outcome, which is important to include when developing the project.

The above problem would be reduced if young people were well informed about PrevBOT’s existence. General awareness of the fact that the bot and the police are keeping track of online activities could also have an effect in itself.

It could of course lead to crime relocation, i.e. that the grooming moves to arenas that PrevBOT is unable to access. If the problem then moves to the darkest corners of the internet, it will in any case mean that both victims and abusers must to a much greater extent seek out the situations consciously. At present, new victims are to some degree being picked up ‘out on the open street’. If the problem relocates, the ‘streets’ will at least be safe for the majority, and the problem would be reduced, if not completely eliminated.

On the other hand, the knowledge that PrevBOT is keeping track could provide a false sense of security. If people blindly trust that ‘big brother’ will intervene in everything that seems suspicious, could they become more vulnerable to attempts that PrevBOT is unable to detect? It is relevant in this respect that the vast majority of abuse is committed by people of the same age. According to the plan, however, PrevBOT will detect large age differences between the chatters, without specifying what should qualify as a ‘large age difference’. Research shows that convicted groomers are mostly men between the ages of 25 and 45. Will a conversation between a man in his mid-20s and a girl of 15-16 be defined as a large age difference? And will it be as easy for PrevBOT to detect as if the man was 40? The greater the required age gap for PrevBOT to intervene, the fewer cases it will detect. And the more attempts by people of a similar age that go under the PrevBOT radar, the greater the false sense of security for people who believe that PrevBOT makes the platform safe.

To summarise the PrevBOT project’s chance of creating an effective tool, there are a number of factors that affect its possibility of success. Of the many possible outcomes, there are admittedly many that can be solved in the design of the tool and its use. PrevBOT is unlikely to detect or be able to avert all grooming attempts, but it will hopefully stop a fair amount. So the chance of success is reasonable enough to defend the first step into the PrevBOT research.

‘Last resort’

If there is only one way to avert this type of crime, a last resort, the requirement for proportionality could be adjusted. It is unlikely to be relevant in this context, however. PrevBOT is neither the only nor the last resort in the fight against this type of crime.

As mentioned under the section on proportionality, the police are more or less obliged to attempt a purely preventive variant of PrevBOT before developing a tool that also collects evidence and facilitates investigation.

Nor is it certain that an evidence-collecting bot would be the very last resort. Should it become relevant to proceed on that track, an assessment and comparison with other methods would be required.

Consideration for third parties

The last point in this first step analysis is about consideration for ‘innocent’ users and others who do not or should not necessarily come into close contact with a PrevBOT in action.

On the positive side, many people would probably welcome such a tool. Parents will appreciate that something is being done. Politicians will be grateful for measures that can make society safer. Had a physical space been as prone to crime as the internet, we would expect the police to send uniformed patrols there or address the problem in one way or another.

But this ‘one way or another’ does not necessarily include a PrevBOT. On the negative side, such a tool can lead to a cooling effect. The mere knowledge that the police have such a tool that tracks, stores and potentially intervenes in our activities, when we go about our lives in open, digital spaces, could make us feel less free and not want to use these arenas. Such a cooling effect could be reinforced if, in practice, PrevBOT intervenes in harmless affection between couples or consenting internet users.

Where the PrevBOT is used and how it is set up will therefore be crucial. How effective should it be? How sure should it be that what is going on is grooming with a (high) risk of ending up in sexual exploitation? Should it be content to warn against and scare away the most obvious serial abusers? Or should it have a lower threshold for intervening in chatters’ online attempts to challenge one another in a sexual manner, with the risk of many ‘false positive’ flags?

A cooling effect may also occur in the absence of a PrevBOT. If nothing is done and the internet continues to be perceived as an increasingly lawless and dangerous space, there is reason to believe that many will steer more clear of it. Parents may want to set stricter limits on their children’s internet use. There may not be anything wrong with that in itself, but there is a risk that those remaining online will be the most vulnerable.

In other words, not taking up the fight against online crime also appears to be negative with respect to people’s sense of freedom and possibility of having a private life online.

All in all, the sandbox project concludes that the criteria in the first step analysis have been met, and that it is ethically right to initiate research on PrevBOT.

The way forward

The sandbox project has assessed and outlined how the PHS can legally conduct research into such an AI tool. However, a green light for PrevBOT research may be of little value if the tool being researched and developed will not be lawful to use.

In practice, such a tool will inevitably need to process (sensitive) personal data. Depending on how it is implemented, its use could be perceived as somewhat intrusive to the privacy of victims and abusers, as well as to random individuals whose conversations are analysed by PrevBOT while they are online.

It would probably be wise to establish a plan early on for assessing the legality of using such a tool in practice, and that could definitely be the topic of a new sandbox project.

The PrevBOT project is still at an early stage, and the way forward depends on many decisions yet to be made. From a data protection perspective, it will be particularly interesting if the ambition is maintained that it will be a tool of prevention used to intercept attempts at grooming. The PrevBOT project is now clear that this is the goal. However, during the transition from idea to ready-to-use AI tool, there are forces that may seek to influence the project, giving the tool the capability to collect evidence against and pursue abusers. The Data Protection Authority recommends that the project identifies at an early stage the uses of PrevBOT it considers unethical and undesirable, and strive during the development phase to prevent such uses from being pursued.

The desire for freedom and the desire for security are often presented as conflicting goals. The PrevBOT project is an excellent example of freedom, security and privacy being interdependent – and that it is all about finding the right balance. Minors have a right to autonomy and a private life , but without a certain level of internet security, they would not be able to exercise their autonomy and freedoms. As the tool is gradually designed in more detail, an important part of the project will be to find this equilibrium.

Trust is essential for a project that seeks to be in line with both regulations and guidelines for responsible artificial intelligence. Emphasising transparency and the involvement of relevant stakeholders through the research project provides a good basis for this.

During the course of the sandbox process, LLMs (Large Language Models) have made their breakthrough, and SLMs (Small Language Models) are set to be launched imminently. The same applies to LAM (Large Action Models). New opportunities are emerging, and the sandbox project has identified several ways in which PrevBOT can help make the internet and everyday life safer for vulnerable groups.

The technology from a successful research project could, for example, be used in apps that run locally on phones and laptops. These would process what is visible on the screen rather than operating on the websites’ domains. You can therefore set up who should be notified, in addition to the person looking at the screen who is subject to attempted grooming.

PrevBOT may end up being not just one tool, but the basis for a number of different measures, which together provide effective protection against online grooming.