Jump to content

WikiCred/2022 CFP/Audit the Impact of Credibility in Online Information Ecosystems

From Meta, a Wikimedia project coordination wiki
Audit the Impact of Credibility in Online Information Ecosystems
A WikiCred 2022 Grant Proposal
Project TypeTechnology
AuthorSwapneel Mehta
(SwapneelM)
Contactswapneelsmehta(_AT_)gmail
Requested amountUSD 10,000
Award amountUnknown
What is your idea?

Create public deliberation mechanisms to audit the impact of credibility in online information-sharing ecosystems such as social platforms. We track 50 most-shared US-based news domains jointly from Wikipedia's Perennial Sources list and the Iffy.news database across Twitter, Meta, Reddit, and Instagram.


Why is it important?

Our work supports the Wikimedia community to engage in quantitative discussions around the effects of news and sources in the digital information ecosystem, particularly to move the discussion beyond the question “is this fake news or not”. The metrics that we track around how an article is shared seem to be a promising indicator of artificial campaigns to promote it, often termed as ‘coordinated inauthentic behavior’ when spreading misleading narratives. Our open-access website will communicate in real-time, the impact of low-credibility content on Twitter, with planned expansion to Meta, Reddit, and Instagram subject to data access via CrowdTangle.

To augment discussions around the credibility of sources on Wikipedia, our software, SimPPL, enables real-time audits of the impact of 50 US-based news sources among those listed in Iffy News and Wikipedia’s Perennial Sources List. SimPPL generates content-agnostic metrics describing the spread of these articles so that discussions and decision-making (e.g. fact-checking, policy interventions, risk assessments) can be pursued with quantitative evidence of the online impact of each of these sources.


Link(s) to your resume or anything else (CV, GitHub, etc.) that may be relevant
  1. CV
  2. Github
  3. Unicode Research - current mentees
  4. SimPPL
  5. Unicode (advisory role; co-founder)


Is your project already in progress?

Yes, we have been working on tracking how coordinated campaigns may be promoting certain state-backed narratives promoted by known "pillars of disinformation" RT and Sputnik News. This was developed as a proof-of-concept for journalists at The Times (UK) through a fellowship called JournalismAI. We presented a talk about this at Truth and Trust Online 2022, and have built a scalable system to study this at https://simppl.org with a demo available at https://demo.simppl.org under the 'Networks and Topics' tab. Here is a slide deck describing the work.


How is this project relevant to credibility and Wikipedia?

Sidestepping the unresolved debate of what is or isn’t fake news, we create content-agnostic mechanisms to audit how online articles are being shared. We limit this pilot project to study 50 of the most popular US-based news domains in iffy.news and the perennial sources list on Wikipedia. We track the toxicity of the tweets sharing these articles, the coordinated behavior in the networks sharing these tweets, and their trends over time. Connecting content with impact is an important signal to remain cognizant of the consequences of online news–especially the impact of Wikipedia (c.f. Meier, 2022) -- including potential risks to articles from manipulative edit campaigns. Most importantly, we create a content-agnostic means to audit the impact of any news source on online social networks.


What is the ultimate impact of this project?

We provide an “arbitration” platform that connects news sources and articles with their impact across social channels so a concrete discussion can be had around the nature of its amplification. By connecting Iffy News and Wikipedia’s perennial sources with their real-time impact, we empower the Wikimedia Community to contextualize the impact a source has in practice. Example: With The Times (UK), we studied 8,000 articles from two sources previously known to publish state-sponsored narratives in Russia–RT and Sputnik News, on Twitter. We are able to proactively source what we suspect to be coordinated inauthentic behavior promoting their articles, making progress using public data, on a problem that has plagued platforms like Twitter and Meta in times of civic volatility as evidenced in the quarterly transparency reports they publish. Especially after the laying off of Civic Integrity Teams at Twitter and Meta and introduction of a fragmented online landscape with multiple emergent platforms, it is imperative to create public systems to democratize the audit of news sources at scale.


Can your project scale?

Yes, we are operating in the cloud with a real-time tweet collection mechanism, DataFlow and BigQuery for real-time tabular data ingestion (80,000 accounts, 24M tweets), Neo4j for low-latency graph-querying, and a ReactJS frontend to serve the metrics to end users via our website https://simppl.org with a few live demo plots that we are testing made available for viewing as end users at https://demo.simppl.org.


Why are you the people to do it?

I am a Ph.D. candidate at New York University working at the Center for Social Media and Politics. I’ve worked on election misinformation at Twitter Civic Integrity, built a patent-pending trending hashtag recommendation system at Adobe Research, and built production ML software at CERN. My team–comprising students I’ve mentored for 3+ years–and I care deeply about knowledge transfer and cross-domain efforts towards AI for social good, having created an active programming community with 200+ members, DJ Unicode; I have also led the NYU AI School since the past two years, where we teach AI/ML to 300+ undergraduate students from underserved communities.

The SimPPL team comprises engineers and social scientists who had been working together for 3-5 years with expertise in audience analytics, hate speech detection, and working with nonprofits. We share the common goal of working on open-access AI tools to support journalists, fact-checkers, and other non-technical stakeholders participate in the digital information ecosystem.


What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?

This project provides a conduit to invite more journalists and social science researchers into the Wikimedia community. We are developing projects with The Times (UK), the Yale Daily News, previously worked with the Vermont Digger, and are part of the News Product Alliance, selected for their mentorship program; advised by a Dow Jones product leader. We will tap into the Knowledge Integrity research arm at the Wikimedia Foundation to solicit expert feedback on prioritising community goals (advised by Pablo Aragon, Barrett Golding). We also plan to deliver research seminars to students from diverse global communities interested in pursuing social network research with open data from Wikipedia and social platforms like Twitter, bolstering Knowledge Integrity efforts.


What are the challenges associated with this project and how you will overcome them?
  1. Collecting social media data at scale requires multiple API keys and online compute and storage: We’ve designed a round-robin algorithm to be able to do this and demonstrated that our data processing architecture in the cloud can collect data from 80,000 accounts containing over 26,000,000 tweets and ingest them into a database.
  2. Low latency querying over knowledge graphs: We have developed a POC using Neo4j that allows us to run fast, efficient queries over our knowledge graph with millions of edges, in mere seconds. We need to expand queries over more complex relationships.
  3. Visualizing results from large communities and networks: We have created a Javascript-based frontend of network, topic, toxicity, and content-related plots for each article we audit. It is currently hosted under the ‘Networks and Topics’ tab of https://demo.simppl.org, but we expect to create a tool similar to the one we helped develop for The Times that measures the levels of activity around the sharing of online articles from two known sources of manipulated narratives and disinformation -- Russia Today and Sputnik News(see Perennial Sources); available as of publishing this at https://parrot.report.
  1. Partnerships: In designing a tool, it is always beneficial to involve stakeholders early on. To this end, we have received affirmation from The Times (UK) that they will provide feedback for us to improve the platform, and Barrett Golding of Iffy News who will also support us in working with the WikiData API.


How will you spend your funds?

A. USD 4,800 (160 hrs): Salary for the Project (and Seminar) Lead

B. USD 2,400 (60 hrs): Salary for part-time Research Engineer

C. USD 2,800: Cloud Compute Services Budget to process Wikipedia, Meta, and Twitter Data via Public APIs x 6 months (based on current spend)


How long will your project take?

6 months starting January 2023


Have you worked on projects for previous grants before?

Yes, I have successfully worked on smaller grants and received some cloud credits to support this project: 1. The NYC Media Lab

2. The Times (via JournalismAI)

3. Algovera AI (twice)

4. Google Cloud Platform

5. Amazon Web Services