Introduction
On December 17, 2024 [1] [5] [6] [8], the European Data Protection Board (EDPB) issued Opinion 28/2024 [1] [2] [4] [5] [6] [7] [8] [10], addressing significant data protection issues concerning the processing of personal data in artificial intelligence (AI) models, particularly large language models (LLMs) [2]. This Opinion provides a framework for privacy regulators to assess LLMs and clarifies the application of EU General Data Protection Regulation (GDPR) principles to AI, emphasizing privacy-preserving techniques and the forthcoming EU Artificial Intelligence Act (AI Act).
Description
On December 17, 2024 [1] [5] [6] [8], the European Data Protection Board (EDPB) adopted Opinion 28/2024 [1] [2] [4] [5] [6] [8] [10], which addresses critical data protection concerns related to the processing of personal data in artificial intelligence (AI) models, particularly large language models (LLMs) [2]. This Opinion provides privacy regulators with a framework for conducting individual assessments of LLMs and clarifies the application of EU General Data Protection Regulation (GDPR) principles to AI. It emphasizes the necessity of effective privacy-preserving techniques [8], such as differential privacy [8], and highlights the forthcoming EU Artificial Intelligence Act (AI Act) [8], which aligns with the objectives of accountability, risk-based governance [8], and transparency [6] [8] [10].
The EDPB clarifies the definitions of the ‘development’ and ‘deployment’ phases of AI models [6]. Development encompasses all activities prior to deployment [6], including code creation, data collection for training [3] [6], and the training process itself [6], while deployment refers to the active use of the AI model [6]. It asserts that AI models trained on personal data are not inherently anonymous and must be evaluated on a case-by-case basis by Supervisory Authorities (SAs). An AI model cannot be deemed anonymous if it is designed to provide personal data about individuals used in its training [3]. For a model to qualify as anonymous [3] [5], both the likelihood of direct extraction of personal data and the likelihood of obtaining such data through queries must be negligible [1] [3] [5]. A thorough assessment of identification risks is necessary [6], taking into account the characteristics of the training data [3], the AI model [1] [2] [3] [4] [5] [6] [7] [8] [9] [10], the training process [3] [6], context [1] [3] [4] [7] [10], additional information [3], costs [3], and available technology [3].
SAs play a crucial role in rigorously assessing claims of anonymity [1], which involves evaluating whether personal data has been effectively anonymized within the model [1]. It is essential to consider the risk of singling out individuals [3], which can be significant [3]. The approach may vary between publicly available models and those intended for internal use [3]. In determining the likelihood of identifying data subjects [1], SAs must consider all reasonable means available to the controller or others for identification [1].
Each controller is responsible for ensuring the lawfulness of data processing when deploying AI models [9]. This includes assessing whether the AI model was developed using lawfully processed personal data [9]. This task may pose challenges for AI deployers [9], who may lack the same level of information as the developers [9]. If the AI model was not developed in compliance with legal standards [9], controllers may struggle to meet their GDPR obligations [9], even if the model is provided by a third-party supplier [9].
To comply with accountability obligations under Article 5(2) of the GDPR [3], organizations must demonstrate effective measures taken to anonymize the AI model through proper documentation [3]. The selection of data sources should prioritize relevance and appropriateness while excluding unsuitable sources [3]. Data preparation for training should focus on using anonymous or pseudonymous data and implementing data minimization strategies to limit the amount of personal data used [3]. Methodological choices in training should enhance model generalization and mitigate overfitting [3], utilizing privacy-preserving techniques like differential privacy [3] [8]. Additionally, measures should be taken to reduce the likelihood of extracting personal data from model outputs [3]. Conducting comprehensive tests on the model is necessary to address known state-of-the-art attacks [3], including attribute and membership inference [3], exfiltration [1] [3] [5] [10], regurgitation of training data [3] [8], model inversion [3], and reconstruction attacks [3].
The Opinion also addresses the implications of unlawfully processed personal data during the development phase on the legality of subsequent processing or operation of the AI model [5]. If a model retains personal data and is deployed by the same controller [4], the initial unlawful processing may affect subsequent use [4]. However, properly anonymized models would not fall under GDPR regulations [4]. The impact of a lack of legal basis for initial processing on the lawfulness of subsequent processing must be evaluated on a case-by-case basis [1], particularly in relation to the context of the processing activities [1]. It is crucial to ensure that an AI model is anonymized before any further processing of personal data occurs [5].
The EDPB highlights that legitimate interest under Article 6(1)(f) of the GDPR cannot be the default legal basis for processing personal data in AI training and usage [6]. Data controllers must conduct a three-step Legitimate Interest Analysis (LIA) to demonstrate that the processing is proportionate [6], necessary [2] [3] [4] [6] [7] [8] [10], and effective for its intended purpose [6]. This analysis requires careful consideration of the rights of individuals against the benefits of the AI services, with examples including enhancing operational security and optimizing resource allocation [8]. The necessity test requires organizations to consider less intrusive alternatives for achieving their goals [8], while the balancing test emphasizes the importance of clearly defined interests to assess benefits and risks [8]. The EDPB encourages organizations to enhance data subjects’ rights [8], such as the right to erasure [8], even in cases where standard grounds do not apply [8].
Non-compliance with GDPR can result in significant fines and corrective actions [8], including the potential erasure of datasets or AI models [8]. The EDPB underscores the authority of SAs to assess the lawfulness of processing and to exercise their powers under the GDPR [1], which may include imposing corrective measures in cases of infringement. Organizations developing or deploying AI models are required to implement strong technical and organizational measures to protect personal data [4], including pseudonymization and data minimization strategies [4]. Data protection authorities are encouraged to evaluate AI models’ compliance with privacy rules on a case-by-case basis [4], taking into account the nature of the processed data [4], the context of processing [1] [4] [8], and potential impacts on individuals [4] [7]. This guidance aligns with the EU AI Act [4] [8], which mandates compliance with data protection laws for high-risk AI systems [4]. Self-declarations of compliance by AI systems do not guarantee adherence to GDPR [4]. Organizations must document their data protection assessments [4], including Data Protection Impact Assessments when necessary [3] [4], and involve Data Protection Officers in evaluating legitimate interest assessments [4].
The EDPB also emphasizes the importance of transparency, urging organizations to inform users about how their online data may contribute to AI development [10]. It outlines criteria for assessing whether individuals can reasonably expect their data to be utilized in AI systems [7] [10], which include the public availability of the data [7] [10], the context of its collection [7] [10], the relationship between individuals and data controllers [7] [10], and the transparency regarding data usage [10]. This comprehensive guidance aims to enhance accountability for organizations managing sensitive personal data and to establish consistent regulatory standards across the European Union and European Economic Area.
Conclusion
The EDPB’s Opinion 28/2024 provides a comprehensive framework for addressing data protection concerns in AI models, particularly LLMs [1], under the GDPR [1] [4]. It emphasizes the importance of privacy-preserving techniques [8], accountability [3] [8] [10], and transparency [6] [8] [10], aligning with the forthcoming EU AI Act. Organizations must ensure lawful data processing, conduct thorough assessments of anonymity, and enhance data subjects’ rights [8]. Non-compliance can lead to significant penalties, underscoring the need for robust technical and organizational measures. This guidance aims to establish consistent regulatory standards across the EU and EEA, enhancing accountability for organizations managing sensitive personal data [10].
References
[1] https://www.lexology.com/library/detail.aspx?g=e4d9cf03-cbda-4a2b-b515-bf2da2303901
[2] https://www.techzine.eu/news/privacy-compliance/127392/eu-declares-when-an-llm-may-use-personal-data/
[3] https://www.jdsupra.com/legalnews/how-anonymous-is-your-ai-model-5158431/
[4] https://ppc.land/european-data-watchdog-clarifies-privacy-rules-for-artificial-intelligence-models/
[5] https://natlawreview.com/article/edpb-publishes-opinion-processing-personal-data-context-ai-models
[6] https://www.lexology.com/library/detail.aspx?g=f97c59dd-a98d-424c-9f5e-6f4bb595fc8d
[7] https://www.irishlegal.com/articles/landmark-opinion-sets-out-eu-rules-on-personal-data-and-ai-models
[8] https://www.apm.law/edpb-opinion-privacy-implications-for-ai-models/
[9] https://www.lexology.com/library/detail.aspx?g=ad5dd9ad-a74d-4f88-85e6-66c50f544075
[10] https://www.techmonitor.ai/digital-economy/ai-and-automation/edpb-ai-data-guidance-harmonise-gdpr-compliance




