Researchers from Cornell and its partner institutions conducted a study on the kinds of mistakes humans make when examining images. This study may enable computer algorithms that assist us in making better decisions regarding visual information, like moderating online content or reading an X-ray.
The researchers analyzed over 16 million human predictions of whether a neighborhood voted for Donald Trump or Joe Biden in the 2020 U.S. presidential election based on one Google Street View image. Humans as a group performed well at the task, but the computer algorithm was more effective at differentiating between Biden and Trump country.
In this study, the researchers also classified the common ways people fumble, and identified objects like American flags and pickup trucks, that leads them astray.
‘We’re trying to understand, where an algorithm has a more effective prediction than a human, can we use that to help the human, or make a better hybrid human-machine system that gives you the best of both worlds,’ said J.D. Zamfirescu-Pereira, first author and graduate student at the University of California.
The study titled ‘Trucks Don’t Mean Trump: Diagnosing Human Error in Image Analysis’ was presented at the 2022 Association for Computing Machinery (ACM) Conference on Fairness, Accountability, and Transparency (FAccT).
Researchers have recently begun giving a lot of attention to algorithmic bias— when algorithms make errors that disadvantage women, minorities, and other generally marginalized populations.
‘Algorithms can screw up in any one of a myriad of ways and that’s very important,’ said Emma Pierson, assistant professor of computer science at the Jacobs Technion-Cornell Institute at Cornell Tech. ‘But humans are themselves biased and error-prone, and algorithms can provide very useful diagnostics for how people screw up.’
The researchers made use of anonymized data from a New York Times interactive quiz that showed pictures from 10,000 locations in the country and asked readers to guess how the neighborhood voted. A machine learning algorithm was trained to make the same prediction by providing it with a subset of Google Street View images and giving it real-world voting results. The algorithm’s performance was then compared to that of the readers.
The ML algorithm predicted the correct answer 74% of the time. When averaged together to show ‘the wisdom of the crowd’, humans were correct 71% of the time, but individual humans scored only 63%.
People incorrectly chose Trump when the image showed wide-open skies or pickup trucks. In a New York Times article, participants stated that American flags made them more likely to predict Trump, even though the neighborhoods with flags were equally split between the candidates.
The researchers classified the human mistakes as the result of bias, noise or variance— categories used to evaluate errors from machine learning algorithms. Bias represents errors in the wisdom of the crowd, e.g. associating Trump with pickup trucks. Variance represents when an individual makes a wrong judgement or bad calls, even though the crowd was generally right. Noise is when the image doesn’t provide any useful information, like a house with a Trump sign in a neighborhood that mostly voted Biden.
Breaking down human errors into categories may help enhance humans in their decision-making. This study can also lead to a better understanding of how to integrate machine and human decision-making for human-in-the-loop systems, where humans can provide input into automated processes.
‘You want to study the performance of the whole system together— humans plus the algorithm, because they can interact in unexpected ways,’ said Pierson.
By Marvellous Iwendi.
Source: Cornell Tech