Google’s new content licensing deal with Reddit is a hot topic among tech professionals and social media users alike. Still, the search giant stands to gain more than just AI training data from its newfound access to Reddit’s API.
The deal, reportedly worth $60 million annually, will grant Google access to Reddit’s diverse swaths of user-generated content. Diversity of content is key as AI companies look to enhance their models in order to gain a competitive edge.
What else is Google getting out of this agreement? Here’s a look at how Reddit’s data will be used and what Google will likely glean from it.
Reddit’s Data is a Treasure Trove for AI Training
Headlines about the deal are focused on the vast amount of Reddit’s data that Google will access under the deal in order to train its large-language models, and rightfully so.
Authoritative information, such as the content published by journalistic and academic sites, has been key thus far in training these models, and it will continue to play a crucial role. However, LLMs also need to be able to communicate in a variety of vernaculars.
That’s why user-generated content from social media platforms like Reddit is so vital. It can help models understand how ordinary people interact, especially informally.
And, because Reddit’s data is well-organized and even structured by the upvote system, it’s extremely usable, which may allow Google to train models more efficiently and outpace AI competitors.
Google Stands to Learn Even More About User Intent
While it’s difficult to overstate the value of Reddit’s data for AI training, the new deal will also empower Google with incredible user intent insights.
AI has come to the forefront of tech in the last year and a half, but it’s not what Google is known for.
The verb “Google” is synonymous with Search. With real-time access to Reddit’s user-generated content—and potentially, data from other social media platforms down the road—Google will be equipped to make tailored, personalized enhancements to Search in order to deliver the utmost value to every user.
Delivering a valuable search experience requires thorough understanding of user intent. Google uses a variety of signals to understand how best to satisfy each query. The search engine aims to understand not just the “what,” but also the “why.”
This is key to providing the most useful results for every searcher. User intent will remain at the core of Search, and become even more vital in the event that Google’s AI comes to play a larger role.
Reddit’s data will help Google understand how different people search, enabling the search engine to drill down even further in order to dramatically improve the quality of results.
In announcing the new Reddit deal, Google described the platform as having “an incredible breadth of authentic, human conversations and experiences.” This sort of data is exactly what AI developers and search engines need to understand how people converse naturally—an element as crucial to AI development as it is to providing more helpful search results.
Maintaining or even further cementing Google’s status as the world’s preferred search engine isn’t the only way Reddit’s data will prove valuable.
Through Ads, Google could roll out replacements for social listening methods widely utilized by digital marketers.
Currently, social listening is often a manual, piecemeal process. But with data like that from Reddit at its disposal, Google could deliver game-changing marketing products. Such solutions could prove extremely valuable, but especially so as marketers lose the third-party cookies they’ve depended on for so long.
The Google-Reddit Deal Isn’t the Only One
Other AI companies have engaged in or are actively pursuing licensing deals to gain access to training data free from the risk of legal action.
OpenAI, for instance, entered into a multiyear deal with Axel Springer, the owner of Insider, Politico, and many other media brands, to rightfully use their content.
The Associated Press also has an agreement with OpenAI, under which the news publisher shares a portion of its text archive.
The big difference between these deals and the Google-Reddit agreement, of course, is that news publishers provide authoritative information, and Reddit’s user-generated content will prove far more dynamic, insightful, and valuable in AI training.
To gain more insights, connect with VELOX Media: Website | LinkedIn | Instagram | Facebook
Published by: Nelly Chavez