Wikipedia has been struggling with the impact that AI crawlers — bots that are scraping text and multimedia from the encyclopedia to train generative artificial intelligence models — have been having on its servers, leading to increased costs and slower load times for human users in some cases. Perhaps in an effort to stop the bots from pummeling the public Wikipedia website and soaking up too much bandwidth, the Wikimedia Foundation (which manages Wikipedia's data) is offering AI developers a dataset they can freely use.
The organization has teamed up with Kaggle, a data science platform, to offer up a beta release of a structured dataset in both English and French. According to Google — which owns Kaggle — the dataset is formatted for machine learning to make it more useful for training, development and data science.
Wikimedia Enterprise notes that the dataset includes "abstracts, short descriptions, infobox-style key-value data, image links and clearly segmented article sections." There are no references or other "non-prose elements," such as video clips. The lack of references could make the issue of attribution for information in the dataset somewhat foggy. However, Wikimedia Enterprise (a part of the Wikimedia Foundation that seeks to make Wikipedia data available through APIs) says that the content in the dataset is freely licensed under Creative Commons, the public domain and so on since it's all from Wikipedia.
This article originally appeared on Engadget at https://www.engadget.com/ai/wikipedia-offers-ai-developers-a-training-dataset-to-maybe-get-scraper-bots-off-its-back-143255593.html?src=rss https://www.engadget.com/ai/wikipedia-offers-ai-developers-a-training-dataset-to-maybe-get-scraper-bots-off-its-back-143255593.html?src=rssJelentkezéshez jelentkezzen be
EGYÉB POSTS Ebben a csoportban


For less than the price of a fully decked-out MacBook Pro

China wants to work with other countries and has laid out its plans for the global governance

As of Friday, anyone trying to watch porn online in the UK will need to subject themselves to an awkward selfie or get their photo ID ready. The UK government announced it will start checking compl

Welcome to our weekly roundup of what's going on in the indie game space. There have been quite a few high-profile arrivals this week, as well as others that have left early access and or arrived o

s.p.l.i.t is the most badass typing game I’ve ever played. It’s actually more of a hacking simulator, cyberpunk thriller and puzzle experience than a typing game, but its core loop is book

A password manager is a crucial tool for anyone, really, but especially students going back to school in the fall. If you're one of them, you probably have more logins to remember now than ever bef