Agentic AI is driving a complete rethink of compute infrastructure

When artificial intelligence first gained traction in the early 2010s, general-purpose central processing units (CPUs) and graphics-processing units (GPUs) were sufficient to run early neural networks, image generators, and language models. But by 2025, the rise of agentic AI—that is, models capable of thinking, planning, and acting autonomously in real time—has fundamentally changed the equation.

With a single click, these AI-powered assistants can turn work items into real outcomes—from booking venues and handling HR tickets to managing customer queries and orchestrating supply chains.

“We’re heading into a world where hundreds of specialized, task-specific models known as agents can work together to solve a problem, much like human teams do,” says Vamsi Boppana, SVP of the AI group at Advanced Micro Devices (AMD). “When these models communicate with one another, the latency bottlenecks of traditional data processing begin to disappear. This machine-to-machine interaction is unlocking an entirely new level of intelligence.”

As enterprises integrate AI agents into live workflows, they are realizing that true autonomy requires a fundamentally new computing foundation.

“The shift from static inference to agentic operation is putting unprecedented pressure on back-end infrastructure, with demand for compute, memory, and networking growing exponentially across every domain,” Boppana adds. “Ultra-low latency data processing, memory-aware reasoning, dynamic orchestration, and energy efficiency are no longer optional—they are essential.”

To support these demands, the industry is moving toward custom silicon designed specifically for autonomous agents. Tech leaders such as Meta, OpenAI, Google, Amazon, and Anthropic are now codesigning silicon, infrastructure, and orchestration layers to power what could become the world’s first truly autonomous digital workforce.

“We work closely with partners like OpenAI, Meta, and Microsoft to co-engineer systems optimized for their specific AI workloads, both for inference and training,” Mark Papermaster, AMD’s chief technology officer, tells Fast Company. “These collaborations give us early insight into evolving requirements for reasoning models and their latency needs for real-time inference. We are also seeing CPUs playing an increasingly important role in agentic AI for orchestration, scheduling, and data movement.”

They are investing in supercomputing systems, cooling technologies, and AI-optimized high-density server racks to manage resources for thousands of concurrent AI agents.

“When you ask Gemini to work with you to create a research report using a few dozen documents or to summarize weekly research on a podcast, it utilizes the AI Hypercomputer [Google’s supercomputing system] to support those requests,” says Mark Lohmeyer, vice president and general manager of compute and AI/machine learning infrastructure at Google Cloud. “Our current infrastructure is designed in deep partnership with the leading model, cloud, and agentic AI builders such as AI21, SSI, Nuro, Salesforce, HubX, Essential AI, and AssemblyAI.”

The Shift from Broad Compute to Purpose-Built Silicon

Agentic systems don’t operate in isolation. They constantly interact with enterprise databases, personal devices, and even vehicles. Inference—the model’s ability to apply its learned knowledge to generate outputs—is a continuous requirement.

“Agentic AI requires much more hardware specialization to support their constant inference demands,” says Tolga Kurtoglu, CTO at Lenovo. “Faster inferencing equals efficient AI, and this is as true in the data center as it is on-device.”

To avoid inference bottlenecks, tech companies are partnering with chipmakers to build silicon tailored for low-latency inference. OpenAI is developing custom chips and hiring hardware-software codesign engineers, while Meta is optimizing memory hierarchies and parallelism in its MTIA accelerators and Grand Teton infrastructure.

“We’ve embraced a codesign approach for a long time, evident in our latest AI advancements like Gemini 2.5, or Alphabet reaching 634 trillion tokens in Q1 of 2025. Agentic experiences often require multiple subsystems to work together across the stack to ensure a useful, engaging experience for users,” Lohmeyer says. “Our decade-plus investment in custom AI silicon has yielded Tensor processing units (TPUs) purposefully built for large-scale, agentic AI systems.”

TPUs are built to be more efficient and faster than CPUs and GPUs for specific AI tasks. At the Google Cloud Next 2025 conference in April, the company introduced the seventh-generation TPU, called Ironwood, which can scale to 9,216 chips per pod with interchip connection capabilities for advanced AI workloads. Models like Gemini 2.5 and AlphaFold run on TPUs.

“Ironwood TPUs are also significantly more power-efficient, which ultimately reduces the cost of deploying sophisticated AI models. This approach, demonstrated by our partnerships with AI21 Labs, Anthropic, Recursion, and more, underscores the fundamental but necessary industry shift toward purpose-built AI infrastructure,” Lohmeyer says.

Transformer-optimized GPU accelerators such as AMD’s Instinct MI series, along with neural processing units (NPUs) and systems on chip (SoCs), are being engineered for real-time adaptability. AMD recently launched its Instinct MI350 series GPUs, designed to accelerate workloads across agentic AI, generative AI, and high-performance computing.

“Agentic AI demands more than accelerators alone. It requires full-system solutions with CPUs, GPUs, and high-bandwidth networking working in concert,” says AMD’s Papermaster. “Through OCP-compliant systems like Helios, we remove latency hotspots and improve data flow. This integration has already delivered major results. We are now targeting a further 20 times rack-level efficiency improvement by 2030 to meet the demands of increasingly complex multi-agent workloads.”

According to AMD, seven of the world’s top 10 AI model builders—including Meta, OpenAI, Microsoft, and xAI—are already running production workloads on Instinct accelerators.

“Customers are either trying to solve traditional problems in completely new ways using AI, or they’re inventing entirely new AI-native applications. What gives us a real edge is our chiplet integration and memory architecture,” Boppana says. “Meta’s 405B-parameter model Llama 3.1 was exclusively deployed on our MI series because it delivered both strong compute and memory bandwidth. Now, Microsoft Azure is training large mixture-of-experts models on AMD, Cohere is training on AMD, and more are on the way.”

The MI350 series, including Instinct MI350X and MI355X GPUs, delivers a fourfold generation-on-generation increase in AI compute and a 35-time leap in inference.

“We are working on major gen-on-gen improvements,” Boppana says. “With the MI400, slated to launch in early 2026 and purpose-built for large-scale AI training and inference, we are seeing up to 10 times the gain in some applications. That kind of rapid progress is exactly what the agentic AI era demands.”

Power Efficiency Now Drives Design, From Data Center to Edge

Despite their performance promise, generative and agentic AI systems come with high energy costs. A Stanford report found that training GPT-3 consumed about 1,287 megawatt-hours—the equivalent of a small nuclear power plant running for an hour.

AI training and inference generate significant heat and carbon emissions, with cooling systems accounting for up to 40% of a data center’s energy consumption. As a result, power efficiency is now a top design priority.

“We are seeing strong demand from enterprises for more modular, decentralized, and energy-efficient deployments for their agent-based applications. They need to put AI agents wherever they make the most sense while also saving on costs and power,” Lohmeyer says.

Infrastructure providers like Lenovo are now delivering AI edge chips and data center racks tailored for distributed cognition. These allow on-device agents to make quick decisions locally while syncing with cloud-based models.

“Heat is the mortal enemy of sensitive circuitry and causes shutdowns, slower performance, and data loss if allowed to accumulate. We now build sustainability into servers with patented Lenovo Neptune water-cooling technology that recycles loops of warm water to cool data center systems, enabling a 3.5 times improvement in thermal efficiencies compared to traditional air-cooled systems,” Kurtoglu says. “Our vision is to enable AI agents to become AI superagents (single point of entry for all user requests) and eventually graduate to AI twins. Realizing superagents’ full potential hinges on developing and sustaining the supercomputing power needed to support multi-agent environments.”

The Future of Enterprise AI is Autonomous, But Challenges Remain

Despite growing momentum, key challenges persist. Kurtoglu says many CIOs and CTOs still struggle to justify the value of agentic AI initiatives.

“Lenovo’s AI Readiness Index 2025 revealed that agentic AI is the area businesses are struggling with the most, with one in six (16%) businesses admitting to having low or very low confidence in this area. That hesitation stems from three core concerns: trust, safety and control; complexity and reliability; and security in integration,” Kurtoglu says.

To address this, Lenovo recommends a hybrid AI approach in which personal, enterprise, and public AI systems coexist and support each other to build trust and scale responsibly.

“Hybrid AI enables trustworthy and sophisticated agentic AI because of its access to your sensitive data, locally on a trusted device or within a secure environment. It enhances responsiveness by not relying on the cloud, avoiding cloud “round trips” for every question or decision,” Kurtoglu explains. “It’s also more resilient, with at least part of agent’s tasks persisting even if cloud connectivity is intermittent.”

Lohmeyer adds that one major challenge for Google Cloud is helping customers manage unpredictable AI-related costs, especially as agentic systems create new usage patterns.

“It’s difficult to forecast usage when agentic systems drive autonomous traffic,” Lohmeyer explains. “That’s why we’re working with customers on tools like the Dynamic Workload Scheduler to help optimize and control costs. At the same time, we’re constantly improving our platforms and tools to handle the larger challenges of getting agent systems and making sure they’re governed properly.”

Boppana notes that enterprise interest in agentic AI is growing fast, even if organizations are at different stages of adoption. “Some are leaning in aggressively, while others are still figuring out how to integrate AI into their workflows. But across the board, the momentum is real,” he says. “AMD itself has launched more than 100 internal AI projects, including successful deployments in chip verification, code generation, and knowledge search.”

As agentic AI expands from server farms to the edge, the infrastructure behind it must be just as intelligent, distributed, and autonomous as the agents it supports. In that future, AI won’t just be written in code—it will be etched into silicon.

https://www.fastcompany.com/91362284/agentic-ai-is-driving-a-complete-rethink-of-compute-infrastructure?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creado 10h | 3 jul 2025, 12:30:02


Inicia sesión para agregar comentarios

Otros mensajes en este grupo.

Is Tesla screwed?

Elon Musk’s anger over the One Big Beautiful Bill Act was evident this week a

3 jul 2025, 17:10:05 | Fast company - tech
The fight over who gets to regulate AI is far from over

Welcome to AI DecodedFast Company’s weekly new

3 jul 2025, 17:10:03 | Fast company - tech
How your data is collected and what you can do about it

You wake up in the morning and, first thing, you open your weather app. You close that pesky ad that opens first and check the forecast. You like your weather app, which shows hourly weather forec

3 jul 2025, 10:10:05 | Fast company - tech
Crypto is about to get even bigger thanks to millennials

How the Boomer wealth transfer could reshape global finance.

Born too late to ride the wave of postwar prosperity, but just early enough to watch the 2008 financial crisis decimate some

3 jul 2025, 10:10:04 | Fast company - tech
Is the Velvet Sundown an AI band? Many on the internet sure think so

The Velvet Sundown is the most-talked-about band of the moment, but not for the reason you might expect.

The “indie rock band,” which has gained more than 634,000 Spotify lis

3 jul 2025, 10:10:04 | Fast company - tech