top of page
Writer's pictureJacob Harrisburg

New European AI laws may impact US firms.

On May 11, the European Parliament passed the EU Artificial Intelligence Act. Given the sudden emergence and popularity of generative AI tools like ChatGPT, DALL-E, Google Bard, and Stable Diffusion, it is unsurprising that the European Commission has made certain revisions to the document in recent weeks to address these technologies.


Notably, the amendments introduced new requirements for transparency and disclosure specifically targeting large language models, which fall under the category of "general-purpose AI systems." These models utilize extensive datasets and machine learning techniques to comprehend and generate content.

Generative AI technologies such as ChatGPT and DALL-E are trained using datasets comprised of publicly available data obtained from web scraping. Consequently, artists and creators have expressed concerns regarding the potential "misappropriation" of their original works by large language systems, and the subsequent lack of compensation from the companies responsible for developing these systems.



The purpose of this regulation is to safeguard creators and copyright holders. However, it could inadvertently impose challenges for American AI firms operating within the European market. Therefore, American AI companies seeking to provide services to EU citizens must be aware of these regulations and explore alternative methods of intentional and careful data collection.


Article 28b 4(c) of the EU AI Act stipulates that providers of generative AI systems must "document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law, without prejudice to national or Union legislation on copyright." The challenge lies in the fact that identifying each segment of data and generated content, or even individual images, is practically impossible. For instance, GPT-3 was trained on a massive 45-terabyte text dataset, making it unfeasible to trace specific data segments within such large and diverse datasets.

Comments


bottom of page