Why sleep-time compute is the next big leap in AI

For much of the AI era, intelligence has been on-demand: a user issues a prompt, and the model responds after reasoning through the request. But as AI systems grow more autonomous and expectations rise for real-time reasoning, low latency, and cost-efficiency, the definition of intelligence is shifting. We’re entering a new phase where AI is expected to stay ready for the next request—even during downtime.

The key to unlocking this proactive AI future may lie in an unexpected moment: when the AI is “asleep,” a phase now called sleep-time compute.

The term was coined in an April 2025 white paper by Letta, a Berkeley-born AI startup spun out of UC Berkeley’s Sky Computing Lab, founded by researchers Charles Packer and Sarah Wooders. Developed in collaboration with Databricks and Anyscale cofounder Ion Stoica and others, the sleep-time compute framework aims to shift AI from reactive to proactive intelligence. Instead of waiting for prompts, AI agents use idle time to precompute answers, refine memory, and anticipate user needs.

Wooders says the idea draws inspiration from neuroscience. Just as humans consolidate memories during sleep and reflect beyond immediate tasks, AI should be able to do the same.

“We might think about a conversation that we had with someone yesterday and make new conclusions from it, or spend time learning new things even if there is no immediate job to be done. AI Agents, on the other hand, don’t spend any time ‘thinking’ outside of the scope of a task,” she tells Fast Company. “With sleep-time, the idea is to give AI agents the same ability to think and process offline just like we do as humans.”

The result is an always-on AI system that’s faster, more cost-efficient, and remarkably responsive. The paper reports accuracy gains of up to 18% in certain reasoning tasks, and a 2.5-times reduction in cost per query. By spreading computation across related queries and reducing redundant processing, response times and operational costs fall significantly.

Why Wait When Your AI Can Think Ahead?

Letta’s approach uses a dual-agent model. One agent handles live interactions; the second, the sleep agent, activates during downtime to analyze past conversations, parse uploaded documents, and reorganize memory. This division allows the system to maintain context without reprocessing everything in real time.

Wooders says the goal is to let agents learn offline by generating “learned context,” or consolidated insights from prior data. “As context windows grow larger, an agent might have a ton of tokens dedicated to storing this learned context—increasing the likelihood that any new task or question is about a topic that it’s already thought about,” she says.

For his part, Packer calls sleep-time compute a successor to test-time compute (TTC), and “the next big direction for scaling AI.” Rather than only adding compute during inference, systems can now scale intelligence during downtime.

Sleep-time compute builds on the idea that “the longer a model can reason . . . the better the final answer,” says Packer. By staying active during downtime, AI agents can refine their memory, precompute likely responses, and redistribute compute resources more efficiently to improve both performance and cost. Stoica, the Anyscale cofounder and UC Berkeley professor, sees this shift as pivotal, noting that “vast quantities of compute will be spent on reasoning at training time or sleep time” to create shared context, unlocking greater efficiency when models are in use.

Test-time compute, or inference, refers to a model’s ability to apply knowledge to generate outputs. Allocating more resources at this stage improves output quality but increases latency and back-end costs.

Always-on tools like chatbots and coding assistants need fast, low-latency responses to serve users effectively. As these systems grow more complex, they require significantly more compute, says Anyscale cofounder Robert Nishihara, driven by “sophisticated agentic systems that demand significant computational resources.”

Letta’s research shows that sleep-time compute also boosts model power. In benchmark tasks like GSM-Symbolic and American Invitational Mathematics Examination (AIME), shifting computation to downtime reduced test-time workload by up to five times without hurting accuracy. Agents can update knowledge, refine memory, and improve performance—all without human input or added GPUs.

Sleep-time Compute is Already Reshaping Billion-dollar Stacks

The concept may sound theoretical, but major tech companies like OpenAI, Anthropic, Cursor, and Google are already building with sleep-time principles.

During an &t=677s">interview panel in June, OpenAI CEO Sam Altman previewed how our future AI interactions will shift.

“I’m excited about a future where multiple copies of AI models like o3 run constantly in the background—reading Slack, checking emails, acting like a team of helpful agents,” said Altman. “I’d love to wake up to drafted email replies and a summary of unfinished tasks like here’s what you didn’t finish on your to-do list yesterday with suggested next steps.”

OpenAI’s AI coding tool Codex now enables asynchronous code refactoring in cloud environments. Moreover, AI code editor Cursor recently launched background agents that operate in parallel cloud environments. Developers can deploy a fleet of agents that run test suites, refactor code, and generate new features in the background, guided by context.

Anthropic’s Claude Code SDK offers similar functionality. Developers can deploy subprocesses that function like backstage assistants, handling testing or debugging without interrupting the main workflow. Google’s “Project Naptime” and “Big Sleep,” an internal collaboration between Project Zero and DeepMind, is also exploring principles similar to sleep-time compute for code vulnerability detection.

Building the Future of Ambient Intelligence

Letta has embedded sleep-time compute in MemGPT 2.0, an open-source framework that equips AI agents with persistent, efficient long-term memory—what it calls “infinite context.” By offloading memory tasks to sleep-time phases, the framework improves context management and reliability. Here, sleep-time compute acts like a silent housekeeper, running continuously to stay organized.

Through asynchronous memory consolidation and simulated scenarios, Letta is advancing a long-standing goal in AI: agents that prepare for the future, not just react in the moment.

Test-time scaling often slows down the user experience, with tasks like Deep Research taking minutes to complete. “But with sleep-time compute, the time the agent can spend thinking is unlimited,” says Wooders. “It’s about creating a new dimension of scaling compute, which historically has led to improvements in AI’s capabilities.”

Letta says its framework is already making an impact, from financial chatbots summarizing earnings reports overnight to medical agents analyzing patient histories while the system is idle.

Letta’s model-agnostic infrastructure lets developers mix and match models within a single agent. For example, a chat agent might run on OpenAI while a sleep-time agent handles memory on Anthropic. This makes it easier to build AI that feels “stateful, always-on, and proactive,” says Packer.

As AI evolves toward multi-agent systems, the ability to think ahead could define the next wave of tech breakthroughs. The most powerful systems won’t just be those with the largest models, but those that know how to process information quietly and efficiently, even in their sleep.

“In the future, vast quantities of compute will be spent on reasoning at sleep time by agents to make sense of new information and context that the agents encounter,” says Stoica, the UC Berkeley professor. “I expect this direction to be a major driver of progress in AI. Engineering the right shared context through reasoning at training time or sleep time will allow for far more efficiency at test time.”

https://www.fastcompany.com/91368307/why-sleep-time-compute-is-the-next-big-leap-in-ai?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creată 7h | 16 iul. 2025, 11:10:02

Autentifică-te pentru a adăuga comentarii

Alte posturi din acest grup

4 tips to end unwanted subscriptions now that ‘Click-to-cancel’ is over

A “click-to-cancel” rule, which would have made it easier f

16 iul. 2025, 15:40:03 | Fast company - tech

Inside ‘Elvis Evolution’: AI and immersive tech bring the King’s life to the stage in London

Stage fright is not a term you’d associate with Elvis Presley, but in 1968 he was all shook up—with nerves. Ahead of his

16 iul. 2025, 13:20:05 | Fast company - tech

Gmail’s new ‘Manage Subscriptions’ tool could change email marketing forever

Inbox fatigue is real. According to one analysis, the average person receives more than 120 emails a day, with some o

16 iul. 2025, 11:10:06 | Fast company - tech

This beloved retro gaming computer is making a comeback—and it’ll cost you $299

Tech nostalgia runs strong among Gen Z. The retro movement has made long-outdated devices desirable

16 iul. 2025, 11:10:04 | Fast company - tech

Windows 95’s look and feel are more impressive than ever

Every so often, Microsoft design director Diego Baca boots up an old computer so he can play around with Windows 95 again.

Baca has made a hobby of assembling old PCs with new-in-box vin

16 iul. 2025, 06:30:02 | Fast company - tech

Jack Dorsey’s new Sun Day app tells you exactly how long to tan before you burn

Twitter cofounder Jack Dorsey is back with a new app that tracks sun exposure and vitamin D levels.

Sun Day uses location-based data to show the current UV index, the day’s high, and add

15 iul. 2025, 21:10:06 | Fast company - tech

The CEO of Ciena on how AI is fueling a global subsea cable boom

Under the ocean’s surface lies the true backbone of the internet: an estimated

15 iul. 2025, 18:50:04 | Fast company - tech

Tomas_r2