When Deepak Varuvel Dennison's father was diagnosed with a tongue tumour, his family faced a medical dilemma that would reveal a much larger problem in our digital age. While his sister, trained in Western medicine, recommended surgery, his parents trusted traditional Siddha remedies from their South Indian heritage.
After extensive online research, Deepak sided with surgery. But his father secretly took herbal concoctions from a traditional healer. Months later, the tumour disappeared completely. This personal experience sparked Deepak's realisation that the seemingly all-knowing internet contains enormous gaps – and generative AI is making this problem worse.
The Digital Knowledge Imbalance
As a researcher at Cornell University studying responsible AI systems, Deepak discovered how generative AI entrenches profound power imbalances in knowledge. The early internet was dominated by English and Western institutions, leaving whole worlds of human experience undigitised. Now, GenAI trained on this skewed digital corpus threatens to permanently erase alternative knowledge systems.
A large-scale study from September 2025 revealed that around half of ChatGPT queries seek practical guidance or information. These systems appear neutral but privilege dominant Western ways of knowing while marginalising oral traditions, embodied practices, and languages classified as "low-resource" in computing.
The data disparities are stark. Common Crawl, one of the largest training data sources, contains 45% English content despite English being spoken by only 19% of the global population. Hindi, the world's third most spoken language used by 7.5% of humanity, represents just 0.2% of Common Crawl data. Tamil, Deepak's mother tongue spoken by 86 million people, accounts for a mere 0.04%.
The Real-World Consequences
This isn't just about digital representation – it has tangible impacts on everything from architecture to water management. Dharan Ashok, chief architect at Thannal in India, works to revive natural building techniques but faces challenges because this knowledge exists primarily in oral traditions and native languages.
"The greatest challenge lies in the fact that this knowledge is largely undocumented," Dharan noted. He recounted missing the chance to learn specific limestone brick-making techniques when the last artisan with that knowledge died.
In Bengaluru, once celebrated for its sophisticated water management, traditional knowledge held by the Neeruganti community has been sidelined. These water managers once controlled flow, ensured fair distribution, and guided farmers on water-efficient crops based on rainfall patterns. With modernisation, their community-led approach gave way to centralised systems, contributing to Bengaluru's current water crises.
Experts now recognise that reviving these lake systems requires the very knowledge that's been marginalised – knowledge that exists only in native languages, passed orally, and absent from digital spaces and AI systems.
Why AI Amplifies Existing Biases
The problem extends beyond data gaps to how large language models function. LLMs exhibit "mode amplification" – they disproportionately emphasise statistically dominant patterns rather than reflecting the original distribution of ideas in training data.
If training data contains 60% references to pizza, 30% pasta, and 10% biryani as favourite foods, LLMs might overproduce pizza responses while underrepresenting or omitting biryani altogether. This occurs because models optimise for predicting the most probable next word.
Reinforcement learning from human feedback embeds creators' values into models. Commercial pressures further skew development toward English-speaking professionals willing to pay premium subscriptions, making them the implicit template for "superintelligence."
Studies consistently show LLMs reflect Western cultural values, overrepresent dominant groups, reinforce their biases, and demonstrate higher factual accuracy for North American and European topics. Even in travel recommendations, they generate richer content for wealthier nations.
As AI-generated content proliferates online, it creates a feedback loop where new models train on previous AI outputs, continuously amplifying dominant ideas while obscure knowledge fades – a phenomenon researcher Andrew Peterson calls "knowledge collapse."
The Institutional Barriers
The challenges aren't merely technical. A senior leader developing an AI chatbot serving 8 million farmers across Asia and Africa acknowledged excluding effective local practices because they lack research documentation. The rationale? Research-backed advice offers liability protection, creating a system that prioritises defensibility over usefulness.
Perumal Vivekanandan's organisation Seva has documented over 8,600 local agricultural practices in India but faces constant roadblocks. Funders question the scientific legitimacy of Indigenous knowledge, while research institutions show little incentive to validate it. This creates a catch-22: without validation, support is scarce; without support, validation is unaffordable.
This demonstrates that while GenAI accelerates knowledge erasure, the root cause lies in entrenched power structures that have long marginalised local and Indigenous knowledge.
Why This Loss Matters Globally
The disappearance of local knowledge isn't just a tragedy for holding communities – it represents a disruption to the larger web of understanding that sustains human and ecological wellbeing. Like biological species adapted to specific environments, human knowledge systems evolved for particular places.
When these systems disappear, consequences ripple beyond their origins. Wildfire smoke ignores postcodes, polluted water crosses state lines, rising temperatures transcend borders, and infectious germs don't require visas. We're enmeshed in shared ecological systems where local wounds become global aches.
Deepak acknowledges his own contradiction: advocating for local knowledge systems while remaining uncertain about his father's herbal remedies. This uncertainty reflects the complexity we must navigate – neither blindly accepting traditional knowledge nor dismissing it entirely.
As climate crisis reveals cracks in dominant knowledge paradigms, and AI developers promise technological solutions, crucial questions remain: Can we engage authentically with dismissed knowledge systems? Or will we continue erasing forms of understanding, eventually scrambling to colonise Mars because we never learned sustainable living on Earth?
The intelligence we most need might be the capacity to see beyond hierarchies determining which knowledge counts. Without this foundation, even massive investments in superintelligence will continue erasing knowledge systems that took generations to develop.