2/11/2021 9:03:28 AM
Contract from Defense Advanced Research Projects Agency (DARPA) pushes work on Semantic Information Defender (SID) forward.
Over the past five years, Illinois Computer Science professor Heng Ji believed that research on information extraction could extend to evaluate the veracity of news stories and change the consumption of news media around the world.
Such a system would use multimedia multilingual information extraction as a basis to analyze media reports from across the world, identify falsified information and prioritize information for analyst review.
Ji is one of several subcontractors to Kitware, Inc., a software research and development company, each of whom received funding awarded to the company through a DARPA contract. These subcontractors are fellow researchers, including Shih-Fu Chang, Columbia University; Siwei Lyu, University at Buffalo SUNY; Ming-Ching Chang, University at Albany, SUNY; Scott Ruston, Arizona State University; Andrew Owens, University of Michigan; and Ewald Enzinger, Eduworks Corporation.
Ji received $892,191 originating from the DARPA contract, which confirmed the potential of this work and will help push it forward. The original DARPA funding stemmed from the Semantic Forensics (SemaFor) program.
World events also proved the relevancy. Previously, disinformation spread about topics like national elections and climate change. No other topic has created as much urgency for this work, though, than the spread of COVID-19.
In response, Ji began on the next stage of this project – called Semantic Information Defender (SID) – in August.
“In the past, we could argue about the accuracy of the news information we receive on a topic and life went on,” Ji said. "But if we can't agree on the fundamental information provided during a time of pandemic, then it becomes a problem that influences life-and-death decisions.
“SID aims to address this issue by performing consistency checking and reasoning and then identifying falsified information.”
She described SID as a product that produces a “personalized documentation system.” This occurs through many different innovations, which include but are not limited to:
- Accurate detection of world events and the entities involved from multiple languages and multiple data modalities.
- A “comprehensive and flexible semantic consistency reasoning approach” to offset any inconsistencies in data used.
- The ability to detect, attribute, and characterize “sophisticated falsified media.” This occurs by evaluating background consistency to determine if a news story is compatible with common sense and world knowledge.
- Use of novel textual-visual semantic consistency analysis to investigate sources of text and video manipulation.
- Media provenance and manipulation history extraction to help characterize the intent behind misinformation.
Ji’s primary contribution to this effort comes through identifying falsified news information, even when a source of information isn’t entirely questionable. Portions of articles or various forms of media may contain falsified information.
Her methodologies include zero-shot text extraction, cross-media and cross-event consistency reasoning, multi-modal knowledge graph construction, and text style analysis.
When applied, these methods identify the falsified information.
“For example, if an attack occurred during an event like the Hong Kong protests, we may find falsified information embedded within some media reports,” Ji said. “We can then fix that information by using our natural language generation technique to automatically regenerate a narrative or summary of this information, including the falsehoods.”
Her inspiration derived from falsified information spreading consistently over the past handful of years.
Seeing posts from even her own friends on social media, and the misinformation they included, was difficult to process. But she also knew this wasn’t an isolated incident; it was indicative of a larger issue.
“My immediate thoughts focused on the way people paid attention to news the way search engines summarized it and ranked it in its search results,” Ji said. “Our system needs to be more intelligent than rankings influenced by importance – especially when ‘importance’ comes from an item’s popularity.
“SID aims to provide more information, so people aren’t drawn to articles containing falsified information just because people read it often."
This material is based upon work supported by DARPA under Contract No. HR001120C0123. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA.
Illinois CS Media Contact: Aaron Seidlitz