top of page
Writer's pictureCaro Robson

NOYB files complaint against OpenAI with Austrian DPA

29 April 2024



 

NOYB has filed a complaint against OpenAI with the Austrian Data Protection Authority DSB on the grounds it cannot meet subjects’ rights of access, erasure or rectification, or guarantee the accurate processing of personal data. 


The complaint relates to a public figure who asked ChatGPT his birthday and repeatedly received incorrect answers. OpenAI responded to the subject’s request for rectification or erasure by saying it was not possible to correct the data, and refused to provide information on the data processed, including its sources. NOYB filed a complaint with the DSB on the subject’s behalf today. 


This may be the first of a number of complaints brought against large language models, which present significant data protection issues. One aspect is whether the models themselves constitute personal data. 


LLMs are made up of a number of “tokens” (elements of words) that are linked to other tokens through a series of “parameters,” based on the statistical likelihood of the tokens being connected in any given text. How the models themselves fit the definition of “personal data” is a complex question.


Behind the models is the collection and pre-processing of training data, followed by training and evaluation of the models. These stages more obviously constitute processing of personal data: 


🔹 Data collection: may include web scraping or transcribing web content


🔹 Data pre-processing: includes improving the (technical) quality of data, cleaning, labelling and transforming it into training data (often performed manually by workers in the Global South)


🔹 Training the model: using the pre-processed training data to test and create new connections between tokens


🔹 Evaluation of the model: assessing how the model functions, often using Reinforcement Learning through Human Feedback (RLHF) (more human intervention)


These stages take place before the model is placed on the market. LLMs present a number of fundamental data protection challenges, at every stage of their lifecycles. The EDPB has created a task force on ChatGPT to consider these questions. 


I suspect the NOYB complaint may be the first of many….

bottom of page