
“Multi-local” newsrooms aim to get more news to more people
April 16, 2025
Students sue Department of Defense for pulling ‘DEI’ books from its school libraries
April 16, 2025Last May, as India conducted the largest democratic election in history, I spoke to reporters and fact-checkers in the country about the challenges they faced identifying political deepfakes. Many of them spoke to shortcomings in existing AI detection tools, and the biases baked into them.
Most commercial detection tools, and even models developed by academic researchers, are trained on datasets collected in and about the Global North. Audio detection models index heavily on training clips in English and other Western languages. Image and video detection models index heavily on training imagery of white people and people with lighter skin tones. Cultural nuances and references from regions in the Global South are often lost in the process.
The impact of this baked-in bias: a lower rate of accuracy for a reporter trying to test a clip circulating in a regional Indian language on WhatsApp. Or at least, anecdotally, that was the experience of many Indian journalists.
A new paper, “News about Global North is considered truthful! The Geopolitical veracity gradient in Global South news,” is going beyond anecdotes to put harder numbers behind this conversation.
“Global North hegemony — and thus, cultural imperialism — [is] substantively percolating into AI algorithms built to mitigate fake news,” write authors Sujit Mandava and Sahely Bhadra from the Indian Institute of Technology Palakkad, and Deepak P from Queen’s University Belfast.
As opposed to investigating deepfake detection models, the study looks more broadly at AI models used for “fake news detection.” In particular, the researchers tested a popular model released in 2020 called FNDNet, which has scored upwards of 98% accuracy for identifying “fake news” on social media in benchmark tests.
One section of the paper jumped out at me. The authors suggest that foundation models trained on “Global North data” are likely to have strong lexical associations between common words used in the Global North and “real/fake labels.” They also theorize that these “Global North words” appear less frequently in fake news from the Global South.
Their hypothesis is that these models therefore have “very limited utility” for identifying fake news in the Global South and output “a lot of misclassifications of fake news as real” — in other words, false negatives.
This hypothesis was borne out by their initial tests of FNDNet. Consider this confusion matrix in the paper, a table that demonstrates the performance of a classification model. In the table “GN” refers to Global North and “GS” refers to Global South. As the authors explain it: “among the incorrect decisions, the bottom-left cell corresponds to false positives (real news classified as fake) and the top-right cell corresponds to false negatives (fake news classified as real).”
As predicted, a model train on Global North data, tested on Global South fake news, had a very high rate of false negatives (40% higher than its rate of false positives).
For the full set of findings you can read the paper here, but the authors admit this is only “coarse-grained support” for their hypothesis and call for additional research.
Still, the authors suggest this data shows that current benchmark tests being used to evaluate models like FNDNet are reproducing the same data inequities, and validating Global North biases in “fake news detection” models. They also call for AI developers to publish “geopolitical model cards,” demanding more transparency about which cultures and peoples are actually represented in their training data.
“Such regulations would force creators of models developed for Global North or otherwise parochial settings to admit so,” the authors write.
The call echoes what I heard from Indian researchers back in May 2024.
“Data is the new oil, right? What kind of oil are you going to put in your engine?,” Mayank Vatsa, a deepfake researcher at the Indian Institute of Technology (IIT) Jodhpur, told me. “Many of the tools which are available…their diversity, the population that they’re looking at, is very different from the population that I’m looking at right now.”
Great Job Andrew Deck & the Team @ Nieman Lab Source link for sharing this story.