What is the Place of Synthetic Data in US and EU Laws?

At the moment, the predominant source of data utilized for training Artificial Intelligence (AI) systems is derived from events occurring in the physical world or from individual attributes (referred to as “collected” data). Nevertheless, a significant transformation is on the horizon. A Gartner study predicted that by 2024, approximately 60% of the data used in training artificial intelligence systems worldwide will be Synthetic Data.

Synthetic Data is artificial data, generally generated by computer simulations or algorithms, which has analytical value. It is important to note that there are various methods of generating synthetic data. It can be generated based on transformation of collected data or generated without (direct) use of collected data. It is generally understood that US federal laws and European laws cover “collected” data. However, does the current legal system in the United States and European Union provide for Synthetic Data? Does Synthetic Data qualify as personal data? What is the place of Synthetic Data in US and European law? This article will try to examine these questions.

In the United States, there are various federal laws that cover different aspects of protection of data. Some of the most important ones include the Privacy Act of 1974 which governs how federal agencies can collect and use data about individuals in its system of records; Health Insurance Portability and Accountability Act (HIPAA) which creates standards for how healthcare providers can use a patient’s personal health data; the Children’s Online Privacy Protection Act (COPPA) which places restrictions on the collection of personal information from children on websites and online services; the Federal Trade Commission Act (FTC Act) which empowers the FTC to protect consumers against unfair or deceptive practices, including those related to data collection and privacy; and the Fair Credit Reporting Act which covers the collection and use of data contained in consumer reports. It is notable that the United States does not have a singular law that covers the privacy of all types of data, unlike the European Union. This fragmentation in U.S. data privacy laws can make it challenging to determine the legal status of Synthetic Data in various contexts. In the European Union (EU), there is a comprehensive data privacy law called the General Data Protection Regulation (GDPR) which came into effect in 2018. This regulation imposes obligations onto organizations anywhere in the world that target or collect data related to people in the EU.

Upon cursory view, the word “collect” appears multiple times in all the aforementioned data protection laws in the US and EU, but nowhere is the term “collected data” defined. In fact, there is no mention of words such as “synthetic” data or “artificial” data in these regulations. This suggests that regulators did not anticipate Synthetic Data when crafting these regulations. Additionally, it is possible that Synthetic Data generated from non-collected data could be considered to be outside the scope of the US and EU laws because none of these legislations define what “collected data” means. Where Synthetic Data is generated without direct use of Collected Data, although it matches the key statistical properties of personal data, it does not map back to any actual natural person. Therefore, it is likely that Synthetic Data might not qualify as personal data because arguably it does not contain information about an identifiable individual.

However, where Synthetic Data is generated through transformation of collected data, these regulations do have provisions on pseudonymous or anonymous data which might be considered to cover Synthetic Data. According to Article 4(5) of the GDPR, pseudonymous data is personal data that cannot be attributed to a specific data subject without the use of additional information. Anonymous data is defined as information that does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. Therefore, in these aspects Synthetic Data might arguably qualify as pseudonymous data or anonymous data.

In conclusion, to the extent that Synthetic Data is transformation of collected data, it is likely that the data protection laws in US and EU might cover Synthetic Data. However, in the instances where Synthetic Data is not generated based on collected data, the current laws in the EU and US fall short of regulating such synthetic data.

ByBoru Gollo, LLM'24