Elon Musk's updated Grok AI claims to be better at coding and math

Elon Musk's answer to ChatGPT is getting an update to make it better at math, coding and more. Musk's xAI has launched Grok-1.5 to early testers with "improved capabilities and reasoning" and the ability to process longer contexts. The company claims it now stacks up against GPT-4, Gemini Pro 1.5 and Claude 3 Opus in several areas. 

Going by xAI's numbers, Grok-1.5 appears to be a large improvement over Grok-1. It shot up to 50.6 percent in the MATH benchmark, over double the previous score. It also climbed to 90 percent and 74.1 percent in GSM8K (math word problems) and HumanEval (coding), respectively, compared to 62.9 percent and 63.2 percent before. Those numbers are within shouting distance of Gemini Pro 1.5, GPT-4 and Claude 3 Opus — in fact, the HumanEval coding score beats all rivals except Claude 3 Opus.

Elon Musk's latest Grok AI boosts coding and math capabilities
xAI

It can also process long contexts of up to 128K tokens within its context window, meaning it can amalgamate data from more sources to understand a situation. "This allows Grok to have an increased memory capacity of up to 16 times the previous context length, enabling it to utilize information from substantially longer documents," the company said.

xAI didn't detail Grok's progress in other areas, though, where it still may be lagging (academic scores, multimodal and others). And Grok-1.5 may not keep its position for long. ChatGPT 5 is set to arrive sometime this summer, promising a feature set that "makes it feel like you are communicating with a person rather than a machine," according to OpenAI. 

Currently, Grok is only available for users of the Premium+ tier on X (formerly Twitter), though Elon Musk recently promised to open it up to X's regular Premium users. The company also recently open sourced its Grok chatbot, after Musk sued OpenAI and Sam Altman for allegedly abandoning its non-profit mission. 

This article originally appeared on Engadget at https://www.engadget.com/elon-musks-updated-grok-ai-claims-to-be-better-at-coding-and-math-120056776.html?src=rss https://www.engadget.com/elon-musks-updated-grok-ai-claims-to-be-better-at-coding-and-math-120056776.html?src=rss
Vytvorené 2mo | 29. 3. 2024, 13:20:15


Ak chcete pridať komentár, prihláste sa

Ostatné príspevky v tejto skupine

Pikmin Bloom has been helping me meet my outdoor walking goals for years

Over the past few years, I’ve developed a peaceful little routine to make up for time spent cooped up inside working on sunny days: after I’ve closed my laptop for the day, I throw my sneakers on,

27. 5. 2024, 0:50:11 | Engadget
iPhone users may get AI-generated emoji and more app customization than ever with iOS 18

iOS 18 may inject a little more fun into the iPhone experience. In the

26. 5. 2024, 20:20:25 | Engadget
The Ninja Creami ice cream maker is down to $149 for Memorial Day

It’s officially ice cream season, and if you’ve been wanting to try your hand at making the dessert yourself, Walmart has a deal you might be interested in. The

26. 5. 2024, 20:20:24 | Engadget
Someone made a Flappy Bird tribute for the Playdate that lets you use the crank to fly

Ah, Flappy Bird. It’s been a long time since I last gave any thought to the game-turned-cultural-phenomenon that briefly had us all in a chokehold a decade ago. At least, that was the case

26. 5. 2024, 18:10:04 | Engadget
Elon Musk is reportedly planning an xAI supercomputer to power a better version of Grok

Elon Musk told investors this month that his startup xAI is planning to build a supercomputer by the fall of 2025 that would power a future, smarter iteration of its Grok chatbot,

25. 5. 2024, 21:30:14 | Engadget
Over a million Switch owners have bought the worst mainline Resident Evil game ever

Resident Evil 6 has sold surprisingly well on the Nintendo Switch since it was ported to the console in 2019, despite it being almost universally panned by fans of the series. As spotted b

25. 5. 2024, 19:20:10 | Engadget
Uh-oh: ICQ is shutting down on June 26

ICQ, which used to be a very popular messaging app for a short period in the 90s and the early aughts, only has a month left before it joins the other apps and software of old in the great big farm

25. 5. 2024, 16:50:18 | Engadget