Meta’s Llama 3.1 is open-source, kind of. Here’s how it could reshape the AI race

Meta today released a trio of new open-source large language models called Llama 3.1, the largest of which may lead to new chatbots that rival ChatGPT. In fact, Meta CEO Mark Zuckerberg believes the company’s Llama powered AI assistant will be more widely used than ChatGPT by the end of this year. 

Llama 3.1 is actually a small family of models–Llama 3.1 405B, 70B, and 8B. (The numbers connote the number of parameters—that is, the neuron-like connection points where calculations are made and weights are applied—used by the models.) The 405B model was trained on a massive amount of data–15 trillion tokens, which represent words or word-parts. The tokens represent web data dating to 2024 (earlier models have been limited in their recency by cut-off dates, sometimes years in the past).

The 405B model was trained using 16,000 of NVIDIA’s H100 graphics processing units. State of the art “frontier” models are trained by processing large amounts of web-scraped, licensed, or synthetically generated text and image data. The new models also have the ability to reach out to other models (via APIs) to tools and knowledge sources such as up-to-date information, math expertise, and coding. 

Developers can download the new Llama models from Meta or from Hugging Face, or access them via major cloud services like AWS, Azure, and Databricks.

Meta calls the 405B version “the world’s largest and most capable openly available foundation model.” The company says the model beats OpenAI’s GPT-4 and GPT-4o, along with Anthropic’s Claude 3.5 Sonnet on commonly used benchmark tests, and “is competitive with” those other models across a range of tasks. Meta believes developers will use its new Llama models to create more agentic chatbots, tools with greater reasoning capabilities, and better computer coding agents. 

The company also uses as examples of Llama 3.1 405B’s power the capacity for “synthetic data generation” and “model distillation.” The former means the ability of one large model to create training data for a smaller model. The latter means the ability of a large model (a “teacher”) to transfer elements of its intelligence to a smaller (“student”) model. Meta says it altered its commercial license agreement to allow for these uses. This could have important implications for how models work together, and economic implications for the return on investment of smaller models. 

But the model will also power some consumer use cases. It now powers Meta’s AI assistant at Meta.ai (for U.S. users, anyway) and within WhatsApp. 

The new models are text-based, not multimodal. But Zuckerberg says in a new video posted on Instagram that his company is working on next-gen models to power multimodal features such as an “Imagine” feature that creates images based on a photo of a person and a prompt (for example, “Imagine me playing soccer”). Zuckerberg says his company is also working on technology that will allow users to create their own AI apps and share them across the company’s social platforms. 

Over the past few years, as the AI race has heated up and attracted billions in investment dollars, companies have grown more and more secretive about how their models are built and how they work. 

Meta says it’s making the model weights publicly available through Hugging Face and a group of technology partners (including Nvidia), along with some new safety tools designed to make sure people don’t prompt the model to do harmful things. 

Open source advocates believe that AI can advance faster and better maintain safety if AI companies develop the burgeoning technology out in the open. Meta has long touted its commitment to open source, but many developers have noted that the company is open about some aspects of its models. 

“Meta is continuing the industry standard of open-washing in AI,” says Nathan Lambert, a machine learning expert who works at The Allen Institute for AI. Lambert says Zuckerberg and Meta’s definition of open-source differs in spirit from the major proposed definitions currently being debated by institutional working groups (which Meta participates in). 

Meta’s definition of “open” seems to permit a lack of information on the data used to train the models. The parameter weights (generated during the model’s pre-training) released with a model are important, but the substance and curation of the training data plays an equal role in the performance of the model, AI researchers have come to believe. “Meta’s release documents detail the data being ‘publicly available’ with no definition or documentation,” Lambert says. 

Scale AI CEO Alexandr Wang says his company, which produces and sculpts synthetic training data, provided a large amount of data used in the fine-tuning and reinforcement learning from human feedback (RLHF) of the new Llama models. 

Others say it’s the terms of Meta’s commercial usage license that fall short. “Meta isn’t open washing (per se) but Meta’s custom license and limits on usage does violate the ethos of open source,” Gartner analyst Arun Chandrasekaran tells Fast Company in an email. 

Despite this, Chandrasekaran believes Llama 3.1 will have real impact for both businesses and consumers. “[T]his will be a very useful model to a large set of enterprise clients,” he says, “and we can also expect Meta to push AI features more aggressively in its consumer products.”

The big picture is that Meta is, first and foremost, a very rich social media company that makes its money selling ads within social feeds. It’s assembled an impressive organization of highly-paid AI researchers that can develop models that help with important parts of Meta’s business, such as content moderation. But it’s also in a position to seed the growing AI ecosystem with its free models and tools, which could benefit both Meta’s influence, and its bottom line, in the future.

https://www.fastcompany.com/91161560/meta-releases-llama3-1-open-source-debate?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creato 1y | 23 lug 2024, 21:30:09


Accedi per aggiungere un commento

Altri post in questo gruppo

RushTok is back. TikTok still can’t get enough of sorority recruitment

The internet’s favorite programming is back on: #RushTok season is officially upon us. 

If this is your first time tuning in, “rush” is the informal name for the recruitment process

7 ago 2025, 07:10:02 | Fast company - tech
Instagram launches map feature. It looks a lot like Snap Map

Location sharing among friends, family, and significant others has quietly become the norm in recent years.

Now Instagram is looking for a piece of the action with the launch of a

7 ago 2025, 00:10:05 | Fast company - tech
WhatsApp removes 6.8 million accounts linked to scam centers

WhatsApp has taken down 6.8 million accounts that were “linked to criminal scam centers” target

6 ago 2025, 21:40:06 | Fast company - tech
Google wants you to be a citizen data scientist

For more than a decade, enterprise teams bought into the promise of business intelligence platforms delivering “decision-making at the speed of thought.” But most discovered the opposite: slow-mov

6 ago 2025, 19:30:04 | Fast company - tech
Apple to invest another $100 billion in the U.S.

President Donald Trump on Wednesday is expected to celebrate at the White House a commitment by

6 ago 2025, 19:30:03 | Fast company - tech
Character.AI launches social feed to let users interact, create, and share with AI personas

Character.AI is going social, adding an interactive feed to its mobile apps. 

Rolled out on Monday, the new social feed may initially look similar

6 ago 2025, 17:10:05 | Fast company - tech
Exclusive: Google Gemini adds AI tutoring, heating up the fight for student users

Just in time for the new school year, Google has introduced a tool called Guided Learning within its Gemini chatbot. Unlike tools that offer instant answers, Guided Learning breaks down complex pro

6 ago 2025, 17:10:04 | Fast company - tech