A legal battle between OpenAI and major publishers has taken an unexpected turn after the AI company accidentally deleted critical evidence in an ongoing copyright lawsuit. The lawsuit, filed by The New York Times and Daily News, accuses OpenAI of using its content to train its AI models without permission.
As part of the case, OpenAI provided two virtual machines for the plaintiffs to search its AI training data for potentially copyrighted content. However, on November 14, OpenAI engineers accidentally erased data from one of the machines.
While most of the data was recovered, the folder structures and file names were lost. This made it impossible for the plaintiffs to determine whether their content had been used.
The deletion has caused serious delays. The plaintiffs’ legal team and experts, who had already spent over 150 hours searching the data since November 1, now have to redo a week’s worth of work.
In a letter filed Wednesday, they expressed frustration, stating the incident underscores the need for OpenAI to conduct its own searches using internal tools.
OpenAI has not commented on the incident. However, the plaintiffs clarified they don’t believe the deletion was intentional.
The case raises larger questions about how AI companies handle copyrighted materials. OpenAI maintains that its use of publicly available data for training AI models like GPT-4 is fair use, even when such models generate revenue.
Nonetheless, OpenAI has struck licensing deals with publishers such as the Associated Press and Dotdash Meredith, reportedly paying millions for content access.
This incident highlights the complexities of AI, copyright law, and transparency in data handling. While the lawsuit proceeds, the broader implications of AI training practices remain under scrutiny.