Object recognition is a computer vision (CV) technique using which computers can recognize objects, such as items in our common households. Now many companies have developed their own systems which are becoming better over time.
The closest object recognition tech you can find is right on your phone, Google Lens. Facebook uses the technique to automatically add ‘Alt text’ in images to help visually impaired users.
However, a recent study titled “Does Object Recognition Work for Everyone?” is published by a team of Facebook AI researchers and it gives a different angle to the story.
It says these object classifications work much better for people with more money, i.e., object recognition could yield better results for a household earning $3500 a month in comparison to a household earning $50 a month.
The research took into consideration 6 different object recognition systems made by various tech companies including Google, Microsoft, IBM, Amazon, Clarifai, and Facebook itself.
Facebook hasn’t shared specific numbers for individual companies. But as per the aggregated results, the 6 systems worked around 20% better for rich households when compared to the poorest.
To test these systems, the researchers used an open source data set called Dollar Street. It contains images of household items taken from 264 homes across 50 countries.
If we talk about accuracy on the basis of countries, the object recognition systems are more likely to recognize objects in United States and Europe than in Africa and Asia.
The possible reason behind the bias?
Now, it is possible that the issues could lie in the training process of these systems. Facebook’s researchers say that many of these systems are trained using publicly available datasets like ImageNet and COCO.
Most of the images present in these datasets belong to Europe and North America. A significantly lower percentage comes from countries in Africa and Southeast Asia.
“Such systems may be much better at recognizing a traditionally Western wedding than a traditional wedding in India, for example, because they were not trained on data that included extensive examples of Indian weddings.”
Also, researchers highlight that not just images, the search queries also play an important role. Some object recognition systems are trained by making them search for images on the internet using queries which are mostly in the English language.
For example, the image results for the keyword “wedding” would be very different than its Hindi counterpart which is “शादी.”
To fix the bias in such AI systems, the researchers plan to do the obvious and improve the training process. They plan to train their system using hashtags in more languages than English. The datasets can be improved by incorporating location data of where the images were taken.
You can read more about the research in this paper.
via Venture Beat