Iceberg Ahead: The Unseen Data Divide in AI

Copper Contributor

85% of AI's training data comes from the global north, highlighting the digitalization gap of the global south. This imbalance is more than just a discrepancy; it represents a profound division that risks distorting AI's applicability and effectiveness across different global communities.

As development professionals, how can we tackle this digitalization divide to ensure AI truly represents a diverse range of human experiences? What innovative strategies can you implement to integrate the unique cultural and digital landscapes of the global south into the AI development process?


7 Replies
We need to target undeserved populations to eliminate technology deserts and make it accessible to all populations, to include people with disabilities and those over 65, and the unhoused, incarcerated and people of the global majority by expanding programs to go to them where they are and guide them along each step.

Great question and great response! This disparity is only going to get worse as it’s predicted that 90% of digital content will be AI-generated (from models trained on data primary from the global north) by the end of 2026. So the sooner we can provide accessibility to to these under represented groups, educate on how to use AI and show the value so they continue to use it, hopefully we can address this gap before worsens.

Big fan of this. Glad this is being talked about.
Our organization serves rural areas, and many other community members who are not represented well in census data. I am looking forward to improvements in data gathering and sharing.
Fully agree.
This is a really important conversation.

The Washington Post had a good article late last year that really demonstrated this: https://www.washingtonpost.com/technology/interactive/2023/ai-generated-images-bias-racism-sexism-st...
This is indeed a crucial topic, these are some strategies to mitigate it that I have found useful. Curious to read other tactics!

- In the system development phase:
1. Diverse Development Teams: Diversity in development teams is key to incorporate various perspectives and mitigating potential biases.
2. 360-View Mindset: Adopt a holistic approach by involving users, beneficiaries, and other stakeholders throughout the development process to ensure a comprehensive perspective is considered. I can't stress this enough, there were so many times that we were able to correct some of the system's results thanks to the comments from the people who worked close to the beneficiaries.
- In the Testing Phase:
Perform bias analysis: Conduct a thorough analysis of the model's results during the testing phase to identify and address biases. For example, do detailed statistical examinations, including simple descriptive statistics related to gender, age, or other characteristics pertinent to the studied population.