Big Tech locks data away. Wikidata gives it back to the internet

While tech and AI giants guard their knowledge graphs behind proprietary walls, a more open model is quietly powering innovative projects from São Paulo to Nairobi. Wikidata, the collaborative backbone behind Wikipedia’s structured data, has become the world’s largest free knowledge database. 

Lydia Pintscher, who leads the Wikidata project at Wikimedia Deutschland, oversees this enormous experiment in open collaboration. More than 25,000 contributors across 190 countries have built a database containing 116.6 million data points, edited nearly 500,000 times daily. 

Unlike with proprietary alternatives, anyone can access, query, and contribute to this growing repository of human knowledge. Developers can build upon this community-driven knowledge base without worrying about corporate gatekeepers or sudden API changes.

Pintscher spoke with Fast Company about how open data challenges Big Tech dominance and enables innovation in underserved markets, and why transparency in knowledge graphs matters more than ever.

The conversation has been edited for length and clarity.

What are a few projects built on Wikidata that reflect technology’s potential for social good?

There are many, but the ones I’d like to highlight are:

  • Govdirectory, making it easier for people to get in touch with their government and make their voices heard on topics that matter to them
  • OpenSanctions, tracking politically exposed persons, their connections, and the sanctions imposed on them, ensuring that international sanctions are enforced
  • Aletheiafact, a fact-checking project from Brazil combating misinformation
  • Open Parliament TV, making it easier to track what politicians are saying in parliament about crucial issues
  • Gestapo.Terror.Orte, a project helping to understand the atrocities of the secret police in Nazi Germany

All of them are grassroots efforts, made possible or easier with the support of Wikidata’s data and community.

When developers could use proprietary APIs from major tech companies, why choose the more complex path of building on open data?

I’ll answer that question with another question: Do you want to be beholden to the whims of a major tech company that could decide tomorrow to no longer make the data available to you, or only make it available to you at a price, and under conditions you cannot agree to? Or would you rather work with and support a movement that cares deeply about access to knowledge for everyone?

On top of that, Wikidata empowers you to be an active participant, not just a consumer. You found an issue in the data? Something you really care about is missing? You can go and make the changes in Wikidata yourself, directly.

How does Wikidata’s approach differ from how companies like Google or Microsoft manage their knowledge graphs?

The starkest contrast is the openness. In Wikidata you can literally go to the website, look up an entry, and sift through every single change that has ever been made to that entry to see how it got to where it is today. And beyond just being able to see what that entry looks like now or looked like in the past, you can also make an edit to it and contribute to the sum of all human knowledge. Right there. With one edit.

The second difference is the complexity and nuance with which we try to model the world. Since the beginning of Wikidata I have found so many beautiful, weird, and thought-provoking entities that really don’t lend themselves to a simple model of the world. Did you know about that one year Sweden decided to have a February 30th, for example? Or all the countries that have more than one capital city? There are plenty of funny examples but also ones that really matter, such as disputed territories where other websites might decide to show you just one side of the dispute depending on where you access their site from. 

We can’t have civil conversations when we don’t even get shown that another view on a topic exists. That’s why I believe it is so important to surface at least some of that complexity. The world we live in is complex, weird, and beautiful, and the technology we use in that world needs to be able to reflect that.

What’s the most notable technical challenge you’ve solved that other organizations building global platforms should know about?

Making a knowledge graph the size of Wikidata publicly accessible and queryable to everyone is definitely a technical challenge, especially given the rate of changes and access to the data. Wikidata gets edited almost 500,000 times a day. Our SPARQL endpoint serves about 10,000 requests per minute, and it is growing every day. Building and maintaining infrastructure to support that with the resources of a nonprofit is definitely a challenge.

What’s your sense of how open data projects will evolve over the next few years?

Large tech companies have been extracting value from the commons for many years, be that in open data or free software. As a society, we need to understand that this is undermining the commons we all rely on, and we need to expect and demand better. I believe, especially in the age of LLMs and related technologies, that we need to understand what this technology is built on, and this is often happening without giving back. 

So I would like to see people contribute more to open projects like Wikidata and then build on that data, all the while giving back to the project they rely on.

The alternative is a world where we as a society do not have influence over the technology we use every day and that democracy depends on. Instead, we’d be beholden to the black-box technology we are given. That’s not a future I wish to live in.

What do you mean about LLMs not “giving back”?

These large AI companies are basically strip-mining the internet. They will undermine the source of a lot of the material that they’re training their models on. If they’re not sending people back to projects like Wikipedia or Wikidata, or many others, they’re basically cutting them off from the people who actually make the answers possible.

Are you saying the sites providing the content might disappear?

So someone put out a blog post about Stack Overflow analyzing how large language models influenced the traffic on their site. And the analysis suggested that if people are just asking their programming questions to an LLM, why would they need to go to Stack Overflow anymore, right? But why is the LLM able to answer programming questions? Because it has been trained on something like Stack Overflow.

So what should AI companies do to ensure the vitality of the communities they’re taking material from?

Two things. One is recognition in the sense of “Hey, this answer you’re getting here is coming from these places,” and they’re starting to do that, so that people can find their way back to the source of that content. And the second is that they’re making a lot of money, and they should give some of that money back to the projects that are making them that money.

How do you handle conflicts when contributors from different countries or just different perspectives disagree about how to structure or present information?

There are community processes to handle editorial disputes, starting with discussing the pros and cons of different ways of describing a situation (in what is called a WikiProject) together with people interested in the same topic. Often, more senior editors can help resolve disagreements that way, for example, by pointing to best practices for modeling or by asking for references for a specific data point someone wants to add. Worst case an entry might get locked down by an admin if different parties can’t stop editing back and forth on a particular point.

Many potentially divisive topics thankfully never even escalate to that level, in part because of how Wikibase, the underlying software of Wikidata, is built. Based on many years of experience in Wikidata’s sister project Wikipedia, from the start we centered it around the concept of verifiability. That means an editor cannot just show up and claim something. They need to have a reliable and trustworthy source for what they claim, such as an article in a reputable newspaper. 

Additionally, we allow differing views and even conflicting claims to stand side by side, something especially important for disputed territories, for example, and then add context to these claims that helps [explain] the nuance of the situation. This can include things such as which international body supports or does not support a specific territorial claim.

Your 25,000 contributors span 190-plus countries. How do you ensure voices from marginalized communities aren’t drowned out by more resourced contributors?

We are dedicating a lot of effort to ensuring that everyone can contribute data that is relevant to them and their communities. For example, we are running editing workshops across Africa to help more people make their first steps in contributing to Wikidata. We are also working on improvements to editing from mobile devices to make sure people who primarily or even exclusively access Wikidata from a mobile phone have a good experience contributing to the world’s knowledge.

What has surprised you most about how developers worldwide have used Wikidata’s open data?

What astonishes me the most is the fact that many of the applications people are building with the help of Wikidata are ones that I would never have imagined when we first started. Take KDE Itinerary, for example, the digital travel assistant that keeps track of all your travel documents and—thanks to Wikidata—reminds you to bring an adapter for your laptop when traveling to a country with different power outlets. Or eRutter, the historical sea-routing website that lets you imagine how you might have traveled from continent to continent in ancient times. 

A Bangladeshi developer with Wikidata can access the same data infrastructure as Google. How does open data level the playing field for innovation in the Global South?

A lot of applications today are powered by data. As a developer, that means you don’t just have to actually build your application, you also have to collect and maintain the data your application relies on. For a large company, that is not as big of a problem, but if you are an individual developer or small team, this really limits what you are able to build. This is where Wikidata is there to support you, with basic data about the things that matter in the world, from people to events to locations to culture, you name it. 

Thanks to a dedicated community of over 25,000 editors on Wikidata, you have access to up-to-date and reliable basic data to build upon. And not just that: Wikidata also provides you with links to 10,000 other websites, archives, social media sites, and more to make it easier to access additional data about the topics you need for your application.


https://www.fastcompany.com/91391335/big-tech-locks-data-away-wikidata-gives-it-back-to-the-internet?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Erstellt 10h | 25.08.2025, 10:50:10


Melden Sie sich an, um einen Kommentar hinzuzufügen

Andere Beiträge in dieser Gruppe

Texas residents push to form a new town to fight Bitcoin mining noise

For months, a group of Hood County, Texas, residents has been pushing to create a new town of their own. The effort began in March, when citizens living in a 2-square-mile unincorporated stretch o

25.08.2025, 20:10:12 | Fast company - tech
Why AI surveillance cameras keep getting it wrong

Last year, Transport for London tested AI-powered CCTV at Willesden Gr

25.08.2025, 13:20:05 | Fast company - tech
The gap between AI hype and newsroom reality

Although AI is changing the media, how much it’s

25.08.2025, 10:50:11 | Fast company - tech
Another AI tool won’t solve your problems. But AI training might

Every company wants to have an AI strategy: A bold vision to do more w

25.08.2025, 10:50:08 | Fast company - tech
Smarter AI is supercharging battery innovation 

The global race for better batteries has never been more intense. Electric vehicles, drones, and next-generation aircraft all depend on high-performance energy storage—yet the traditiona

24.08.2025, 11:40:14 | Fast company - tech
AI passed the aesthetic Turing Test, raising big questions for art

Pick up an August 2025 issue of Vogue, and you’ll come across an advertisement for the brand Guess featur

24.08.2025, 09:20:14 | Fast company - tech