UC Riverside researchers develop method for erasing private data from AI models

James B. Milliken
James B. Milliken
0Comments

A team of computer scientists at the University of California, Riverside (UC Riverside) has introduced a method to remove private and copyrighted data from artificial intelligence models without needing access to the original training data. This work was presented in July at the International Conference on Machine Learning in Vancouver, Canada.

The new approach addresses concerns about personal and copyrighted materials remaining embedded in AI models, which can be accessed by users even after efforts are made to delete or protect such information. These concerns have grown as technology companies face stricter privacy regulations, including the European Union’s General Data Protection Regulation and California’s Consumer Privacy Act.

“In real-world situations, you can’t always go back and get the original data,” said Ümit Yiğit Başaran, a UCR electrical and computer engineering doctoral student and lead author of the study. “We’ve created a certified framework that works even when that data is no longer available.”

Recent legal actions, such as The New York Times lawsuit against OpenAI and Microsoft over the use of its articles for training AI models, highlight ongoing disputes about copyrighted content being used without authorization.

AI models typically learn language patterns from large datasets scraped from online sources. Sometimes these models generate text that closely resembles their training material, allowing users to access content that might otherwise be behind paywalls.

The UC Riverside team—Başaran, professor Amit Roy-Chowdhury, and assistant professor Başak Güler—developed what they call a “source-free certified unlearning” method. This technique enables developers to remove specific data by using a surrogate dataset that statistically matches the original data. The system adjusts model parameters and introduces controlled random noise to ensure targeted information is erased.

Their framework builds on existing AI optimization concepts that estimate how retraining would affect a model. The team enhanced this with a new noise-calibration mechanism to address differences between original and surrogate datasets.

Tests with both synthetic and real-world datasets showed that their method provides privacy protections similar to full retraining but uses much less computing power.

Currently, this work applies to simpler AI models but could eventually extend to more complex systems like ChatGPT, according to Roy-Chowdhury, co-director of UCR’s Riverside Artificial Intelligence Research and Education (RAISE) Institute.

Beyond meeting regulatory requirements, the researchers believe their technique could benefit organizations handling sensitive data—including media outlets and medical institutions—and allow individuals to request removal of personal or copyrighted content from AI systems.

“People deserve to know their data can be erased from machine learning models—not just in theory, but in provable, practical ways,” Güler said.

Future plans include refining the method for more complex models and building tools for broader use among AI developers worldwide.

The research paper is titled “A Certified Unlearning Approach without Access to Source Data.” It was completed with Sk Miraj Ahmed from Brookhaven National Laboratory in Upton, NY. Both Roy-Chowdhury and Güler hold faculty positions in UCR’s Department of Electrical and Computer Engineering with secondary appointments in Computer Science and Engineering.



Related

Dina El-Tawansy, Caltrans Director

California awards $202 million for clean transit projects targeting polluted communities

Caltrans has announced it will allocate $202 million to 143 local clean transportation projects across California.

Ron S. Jarmin, Director

U.S. Census Bureau to hold webinar on updates to Vintage 2025 population estimate methods

The U.S. Census Bureau will host a webinar on Tuesday, January 20, at 1:30 p.m. Eastern Time to discuss methodology updates for the Vintage 2025 population estimates.

Ron S. Jarmin, Director

U.S. Census Bureau releases December 2025 business formation statistics

The U.S. Census Bureau has published the latest Business Formation Statistics (BFS) for December 2025, offering updated information on new business applications and formations across the United States and Puerto Rico.

Trending

The Weekly Newsletter

Sign-up for the Weekly Newsletter from LA Commercial News.