What happens when we train our AI on social media?

The unique relationship between social media and AI continues to develop in interesting ways, with Reddit’s recent deal with Google allowing their AI to train on its content, and tech companies’ sudden interest in once-popular social media sites like Photobucket

But there is a question looming in the background of these deals: Even if AI companies can train their models on social media, is doing so really such a good idea? 

AI companies have largely relied on the internet, especially social media content, to train AI because of the massive amounts of data these models require to function. Not only is the internet vast, but much of it is free and public through platforms that offer archives of free web scraped data, such as Common Crawl. Companies such as Meta have used data gathered from their own social media sites to train their models as well. 

However, while cost effective and efficient, training these models on social media has its drawbacks. Not only are there user privacy concerns, but social media is also known to be a breeding ground for toxicity and misinformation. The concern, then, is that AI trained on that content will in turn display such patterns. 

“While you do not have to expose artificial intelligence to statements that defamed people for it to [defame them]…it certainly makes it more likely,” says Carnegie Mellon professor Matt Fredrikson. “If a model saw an example on social media it would be more likely to generate that.”

The benefits and risks of conversational chatbots

Training AI on social media, AI researcher at Hugging Face Dr. Sasha Luccioni says, will teach models slang and human jargon like “BRB,” giving it a more informal tone. Training AI on a diversity of sources will make it come across even more human-like, as opposed to other methods, such as textbook training, which make it sound more mechanical. 

While this is a positive development in some cases, such as when creating a chatbot for entertainment purposes, the consequences can be severe, as this will make it more difficult to detect when you are talking to a chatbot, says Fredrikson. 

The casual and conversational nature AI learned from social media platforms will make the tech sound more credible, making it easier for it to push misinformation or harmful content as a result.

“These technologies are pretty widely available, and you don’t need a huge amount of computational resources to deploy them,” he says. “You don’t have to have access to all of Reddit to have a model who is going to do a decent job of engaging people on social media and pushing forward a particular agenda of their choosing.”  

When safety is not a priority

There are currently solutions being developed to mitigate the misinformation and toxicity that AI internalizes when trained on social media, such as watermark that identifies that the content was generated by AI. 

In addition to this, companies “can intentionally instruct [AI] on data later that tries to correct some of the harmful behaviors,” Fredrikson says. These actions, he adds, are reflective of an idea called alignment, where AI advances agendas in line with human goals and values. 

There are many organizations that have elected to take responsibility for the information their AI is trained on and are taking these safety measures, he says. Others, unfortunately, are not.

“Safety is often overlooked because these companies are moving very fast,” says AI researcher and Carnegie Mellon doctoral student Andy Zou. “There are no super standardized regulations for AI right now, but building more awareness around the problem will help.”

https://www.fastcompany.com/91109348/hed-what-happens-when-we-train-our-ai-on-social-media?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creată 1y | 19 apr. 2024, 18:30:06


Autentifică-te pentru a adăuga comentarii

Alte posturi din acest grup

Anthropic’s AI copyright ‘win’ is more complicated than it looks

Big tech scored a major victory this week in the battle over using copyrighted materials to train AI models. Anthropic

24 iun. 2025, 19:40:06 | Fast company - tech
How Roblox handles millions of players on viral games like ‘Grow a Garden’

Just this past weekend, social and gaming platform Roblox saw a peak of 30.6 million concurrently active players, the

24 iun. 2025, 17:30:02 | Fast company - tech
Meet the 4 a.m. club, TikTok’s mystical election night movement

Did you wake up at 4 a.m. on November 6, 2024? If so, you’re not alone.

The 4 a.m. club is a group of people, mostly on TikTok, who say they were spiritually “activated” when they

24 iun. 2025, 15:10:08 | Fast company - tech
Nonstop news alerts are driving people to disable their phone notifications

New analysis has found mobile phone users are being pinged with as many as 50 news alerts daily. Unsurprisingly, many are experiencing “alert fatigue.”

The use of news alerts on phones h

24 iun. 2025, 15:10:06 | Fast company - tech
Warp’s new agentic development environment helps developers work with AI coding agents

The startup Warp is best known for its modern, AI-empowered take on the terminal—the decades-old,

24 iun. 2025, 15:10:04 | Fast company - tech
This free read-it-later app is the perfect replacement for Pocket

Want to save pages on the web for later? You could always bookmark them in your browser of choice, of course. But that’s a quick way to end up with a messy bookmarks toolbar. And organizing your b

24 iun. 2025, 12:40:09 | Fast company - tech
The rise of the personal AI advisors

When a viral Reddit post revealed that ChatGPT cured a five-year medical mystery in seconds, even LinkedIn’s Reid Hoffman took notice. Now, OpenAI’s Sam Altman says Gen Z and Millennials are treat

24 iun. 2025, 12:40:09 | Fast company - tech