The New York Times Doesn’t Want AI Scraping Its Content
The widespread phenomena of ChatGPT has very quickly entered the U.S. workforce, with some employers like Microsoft and Google already restricting its use.
With justified concerns that current AI chatbot technology would cause significant harm to media publishers, media publishers are starting to look closer at whether or not it makes sense to integrate AI tools like ChatGPT into its everyday operations.
Earlier this month, The New York Times updated its Terms of Service to restrict its content from being used for purposes of training any machine learning system or AI models, such as ChatGPT, as initially reported by Adweek.
This would include the use of any text, photos, audio/video clips, metadata, and/or any type of web crawler tools that are built to scrape, access, and/or collect NYT-based content that is for the benefit of either developing a software program that uses AI or updating the underlying infrastructures behind these AI-based algorithms.
The announcement also spoke to digitizing the NYT’s archive of photos and clippings, and providing machine learning techniques to content moderators through its Moderator system.
Publishers Sign Open Letter Calling on Lawmakers to Craft New Policies
Earlier this month, other news publishers including The Associated Press and the European Publishers’ Council called on global lawmakers, in an open letter, to craft new policies that would require consent from copyright holders and transparency with public datasets before it is used for training AI systems.
“The media industry has a history of embracing and successfully navigating new technology, from the introduction of the printing press to broadcast media to the internet and social media,” the announcement reads in-part, noting that “the pace of development and adoption of AI far exceeds that of prior technological leaps, and it does so at the potential expense of long-standing foundational intellectual property rights, as well as the creators’ investments in high-quality media content.”
Current signatories of the letter also include Gannett (USA Today Network), Getty Images, The Authors Guild, News Media Alliance, National Writers Union, National Press Photographers Association, and Agence France-Presse.
NYT Inks $100M Deal With Google to Showcase Content
In February, the NYT and Google signed a $100 million deal that would allow for Google to feature and showcase NYT content across some of its platforms over the next three years, while working together to build tools that better speak to how its content is distributed, marketed, and advertised to consumers.
In July, Google updated its privacy policy to include its AI-based programs Bard and Cloud AI, which it stated could be trained on public data that the tech giant scrapes from the internet, as initially reported by Gizmodo.
A Google representative recently told The Verge that it has been transparent about its privacy policy using publicly available information from the open internet to train language models that help enhance services like Google Translate, and that the recent update expands that notice to include Bard and other AI tools.
In a blog post, OpenAI acknowledged the potential misuse of its GPTBot, a new webcrawling bot that reportedly collects publicly available data from websites, while also avoiding paywalled and restricted content. It also shared that it now allows website operators to prevent its GPTBot web crawler from scraping data from their websites, including blocking its IP address.
A recent poll from Reuters surveyed 2,625 adults across the U.S. between July 11 and 17, revealing that 28% of respondents were regularly using ChatGPT at work, while only 22% stated that their employers explicitly allowed the use of these AI-based tools.
On the other hand, approximately 10% of those polled said their employers explicitly banned outside AI tools like ChatGPT, while 25% had no clue as to whether their companies allowed for the use of that technology.