A new study by researchers at the University of California San Diego and the University of Chicago examines how visual artists can protect their work from being used without consent by generative AI tools. The findings will be presented at the 2025 Internet Measurement Conference in Madison, Wisconsin.
The research highlights that while preventing AI crawlers from accessing creative works is one of the best protective measures, most artists lack either access to necessary tools or knowledge about how to use them. “At the core of the conflict in this paper is the notion that content creators now wish to control how their content is used, not simply if it is accessible. While such rights are typically explicit in copyright law, they are not readily expressible, let alone enforceable in today’s Internet. Instead, a series of ad hoc controls have emerged based on repurposing existing web norms and firewall capabilities, none of which match the specificity, usability, or level of enforcement that is, in fact, desired by content creators,” according to the researchers.
The team surveyed over 200 visual artists regarding their efforts to block AI crawlers and reviewed more than 1,100 professional artist websites for evidence of control over such tools. The survey found that nearly 80% of respondents had attempted proactive steps to prevent their art from being included in training data for generative AI models. Two-thirds reported using Glaze—a tool developed by co-authors at the University of Chicago—to mask original artworks from crawlers.
Many artists have also reduced what they share online; 60% said they post less work and 51% only upload low-resolution images. Still, demand for effective technical solutions remains high: 96% wanted access to a tool that could deter AI crawlers. Yet more than 60% were unfamiliar with robots.txt—one simple way to restrict crawler access.
Robots.txt files can tell web crawlers which pages or sites are off-limits but do not guarantee compliance. Researchers found that over three-quarters of artist websites were hosted on third-party platforms where modifying robots.txt was not possible. Only Squarespace provided an interface for blocking AI tools via robots.txt—but just 17% of Squarespace users enabled this option.
Large company-operated crawlers generally respect robots.txt instructions; TikTok owner ByteDance’s Bytespider was a notable exception. Many other crawlers claim compliance but verification was difficult. “The majority of AI crawlers operated by big companies do respect robots.txt, while the majority of AI assistant crawlers do not,” write the researchers.
Cloudflare has recently introduced a feature allowing users to block certain AI bots directly through its network services; however, only a small percentage (5.7%) currently use it.
“While it is an ‘encouraging new option’, we hope that providers become more transparent with the operation and coverage of their tools (for example by providing the list of AI bots that are blocked),” said Elisa Luo, one author and Ph.D. student at UC San Diego.
The legal environment around data scraping for AI training varies worldwide and remains unsettled in many regions including both the United States and European Union—which recently passed legislation requiring authorization from copyright holders before using their data for training models.
“There is reason to believe that confusion around the availability of legal remedies will only further focus attention on technical access controls,” write the researchers. “To the extent that any U.S. court finds an affirmative ‘fair use’ defense for AI model builders, this weakening of remedies on use will inevitably create an even stronger demand to enforce controls on access.”
This study received funding support from NSF grant SaTC-2241303 and Office of Naval Research project #N00014-24-1-2669.



