Tesla details how it finds punishing defective cores on its million-core Dojo supercomputers — a single error can ruin a weeks-long AI training run

Tesla's Stress tool detects and disables faulty cores in Dojo wafer-scale processors, which power Dojo clusters with millions of cores, without interrupting AI training.

https://www.tomshardware.com/pc-components/cpus/tesla-details-how-it-finds-punishing-defective-cores-on-its-million-core-dojo-supercomputers-a-single-error-can-ruin-a-weeks-long-ai-training-run

Erstellt 1d | 07.06.2025, 14:10:05


Melden Sie sich an, um einen Kommentar hinzuzufügen

Andere Beiträge in dieser Gruppe

 Alienware AW2725Q 4K 240 Hz OLED gaming monitor review: Serious value and high performance

Alienware brings value to the premium gaming monitor category with its AW2725Q. A 27-inch QD-OLED 4K panel with 240 Hz, Adaptive-Sync, HDR10, Dolby Vision and wide gamut

08.06.2025, 17:50:07 | tomshardware.com
 Commodore shocks retro TechTuber with option to buy 'the whole company'

In response to a licensing inquiry, Commodore Corporation has shocked retro TechTuber Perifractic with an option to buy 'the whole company'.

08.06.2025, 17:50:06 | tomshardware.com
 Bambu Lab removes third-party app support for P and A series 3D printers

Bambu Lab has released firmware for its P and A series 3D printers that removes support for the installation of third-party apps.

08.06.2025, 17:50:06 | tomshardware.com
 Microsoft Surface Pro vs. Surface Laptop: Snapdragon X Elite showdown

The Microsoft Surface Pro may be portable and versatile, but the Surface Laptop is the more practical option for most. We compare the two systems to see which is right fo

08.06.2025, 17:50:05 | tomshardware.com
 AMD unveils Ryzen AI Z2 Extreme with AI processing NPU, expands family with two new chips for handheld gaming consoles

New chips signal AMD’s push into a broader handheld market giving more options to OEMs

08.06.2025, 17:50:05 | tomshardware.com
 ChatGPT used to disable SecureBoot in locked-down device – modded BIOS reflash facilitated fresh Windows and Linux installs

A modding enthusiast has used ChatGPT to circumvent the Factory Reset Protection (FRP) features of a used tablet.

08.06.2025, 15:30:09 | tomshardware.com
 Researchers convert old phones into 'tiny data centers' — deploy one underwater for marine monitoring

A group of researchers from the University of Tartu developed a way to reuse old smartphones as an edge node to process data in real-time.

08.06.2025, 15:30:08 | tomshardware.com