Hi everyone, Jerry and Wyatt here from Halluminate (https://halluminate.ai/). We help AI labs train computer use agents with high quality data and RL environments.
Training AI agents to use computers, browsers, and software is one of the highest-potential opportunities for AI. To date, however, this capability is still unreliable. The emerging method to improve this is called Reinforcement Learning with Verifiable Rewards (RLVR). However, researchers are currently bottlenecked by a lack of high-quality simulators and task + verifiers.
To solve this problem, we’re building Westworld, a fully-simulated internet made up of synthetic versions of the most common consumer and enterprise apps. Agents use Westworld to learn how to do economically valuable tasks.
For example, AI agents can practice planning vacations on a simulated flight booking site (https://flights.halluminate.ai/), or learn how to reorganize outdated information in your sales platform, or train to do financial modeling directly in a spreadsheet.
Here’s a demo showing our flight booking simulation: https://www.loom.com/share/74a3b28067e24c1b886054ba90a90aa5.
How it works: AI agents access our environment and are given a task + verifier. A task is basically an objective for the agent to achieve, for example "Book me a flight from SF to NYC on this date with x, y, z filters.” A verifier is a programmatic way to determine if the task was successfully completed. For example, in this case it might be a json that checks if the final flight data matches expectations. These signals can then be used to calculate a reward in RL.
The more simulators we build, the more AI labs can improve on capabilities that computer use agents are currently weak at. One of our customers saw a ~20% improvement in date-picking performance when training on our flight booking simulator.
Two things make this hard so far:
(1) The simulations have to be realistic. You can’t get away with a vibe-coded “80% solution” because even small divergences impact performance. Generating simulated data is even harder. For example, massaging flight data to look realistic took a lot of trial and experimentation.
(2) The tasks you train agents on have to be well-chosen. They are only valuable if they reflect work that people actually want solved. We need a lot of feedback from domain experts to get this right.
That said, we find this work incredibly interesting and are excited to tackle these issues. A few things we are pumped to ship in the near term: - Ability to train on long-horizon tasks by stringing multiple simulators together for extended workflows; - Procedural data generation. Instead of synthetically generating all the data upfront, how can we model data generation so that our simulators are populated procedurally as agents explore (think Minecraft); - Open source! We plan to release our environments to the public so developers/researchers can hack them for their own experimentation.
RL simulators are just one part of our business. The other part is around human data creation (think Scale AI but for computer use). We produce off-the-shelf pre-training/fine-tuning datasets, expert human evaluation/error analysis, or any other data needs for our customers. There are also a lot of exciting overlaps between the two - for example, using human experts to help create our simulators/tasks. Happy to go in more detail, but we thought that simulators would make for the more interesting HackerNews post :)
Finally, about us: Wyatt and I met while studying CS at Cornell and have been living and working together for over 7 years. I previously led product/research at Capital One Labs, where I launched one of the first AI agents in banking. Wyatt previously was a Cornell Milstein scholar and did large-scale data engineering for 2 early-stage startups in NYC. We left our jobs last year, and faced these problems first-hand while building evals for our customers who were browser/computer use agent companies.
If anyone has any questions, feedback, or thoughts please let us know! Looking forward to your comments.
Comments URL: https://news.ycombinator.com/item?id=44865290
Points: 5
# Comments: 0
Connectez-vous pour ajouter un commentaire
Autres messages de ce groupe

Article URL: https://planetscale.com/blog/announcing-neki
Comments URL: ht


Article URL: https://www.bbc.com/news/articles/cjr11qqvvwlo