skip to main content

Parameswaran Aims for Algorithms That Would Add Perspective to the Voices From the Crowd

7/10/2018 2:06:23 PM David Mercer, Illinois Computer Science

Assistant Professor Aditya Parameswaran
Assistant Professor Aditya Parameswaran
The effectiveness of algorithms that collect, combine, and coalesce crowdsourced data is limited by their inability to do more than simply treat those answers as is – the algorithms lack the ability to consider the factors behind a person’s answers and opinions.

Illinois Computer Science Assistant Professor Aditya Parameswaran plans to use an award from the U.S. Army Research Office’s Young Investigator Program to develop techniques to refine crowdsourced data by considering the personal perspectives that influence those answers.

The project aims to improve the quality of crowdsourced information that is the primary source of data used to train machine learning algorithms used for everything from screening video content to driving autonomous vehicles.

“This is hugely important to generate the training data that you would need to train the algorithms that would eventually help you decide, ‘Hey, is this content that should be viewable by children?’ Or, ‘Based on the video that I’m seeing or what the camera is seeing, is there a person walking in front of my car?’” Parameswaran said.

The Army Research Office considers its Young Investigator Program awards to be “one of the most prestigious awards bestowed by the Army on outstanding scientists beginning their independent careers.”

For his project, Parameswaran will build on his own work in the crowdsourcing arena.

Parameswaran is an expert on incorporating people into data-analytics systems, and he co-authored “Crowdsourced Data Management: Industry and Academic Perspectives.” The book is considered the definitive guide to crowdsourcing at scale.

Existing crowdsourcing algorithms are only capable of providing a simple consensus of the members of the crowd: If five people watch a piece of video taken from a car driving down the street and two believe the pedestrian seen waiting by the roadside could walk in front of the car -- creating a reason for a human to intervene and apply the brakes -- while the other three see no reason for concern, then the opinion of the latter group will be judged correct.

Example of correlations between annotations on data items in the same batch. Annotators are asked to label whether a review on the movie “The Imitation Game” crawled from IMDb is positive. Assigning each review-movie pair to distinct annotators can be costly, while assigning a batch of reviews together with a movie to annotators might affect their judgments.
Example of correlations between annotations on data items in the same batch. Annotators are asked to label whether a review on the movie “The Imitation Game” crawled from IMDb is positive. Assigning each review-movie pair to distinct annotators can be costly, while assigning a batch of reviews together with a movie to annotators might affect their judgments.

In his proposal, Parameswaran provides what he calls a recipe for a more sophisticated algorithm to sort the many opinions provided by crowd workers, clarify what they mean, and do it in ways that provide insight into the perspectives that influence their decisions and answers.

Sorting those answers over a wide range of subjects and situations should help cluster crowd workers into general teams with broad areas of agreement. That would allow answers and opinions provided by future crowd workers to be filtered by their perspectives.

“Let’s say in my case, I’m just not a very careful person, and because I’m not a very careful person I say the car will do just fine. While you may be a much more conservative person and say, ‘Yes, the person needs to intervene in these circumstances,’” Parameswaran said.

The Army-funded research is aimed at solving military problems that are similar to those faced by companies that crowdsource training data, he said.

“They have a lot of text, video, and image data that they need labeled in order to drive autonomous applications,” situations in which the accuracy of such data could have life-or-death consequences.

But the potential military uses also extend to developing algorithms that serve in an advisory capacity, mediating to help a decision-maker weighing differing opinions from experts and witnesses on the ground.

“Let’s say you have an expert who is looking at a terrain photograph for a disaster rescue setting in Puerto Rico and saying, ‘That building seems to have people stranded on the roof.’ And others weigh in and say, ‘No, I don’t believe those are people. I’ve seen similar things.’ Now how do you weigh opinions in this context?” Parameswaran said.