Jump to content

Research:Wikipedia Inconsistency Detection

From Meta, a Wikimedia project coordination wiki

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


The Stanford OVAL lab is developing a research tool to help Wikipedia editors identify and resolve factual inconsistencies across articles.

We are seeking 25–30 Wikipedia editors to test our tool and provide valuable feedback, and to help evaluate and improve its functionality. Participants will evaluate about 50 statements each. We greatly appreciate your participation.

+++ Interested? Please fill out this form! +++

We will provide an online tool that helps you systematically look for inconsistencies in Wikipedia. It offers several helpful features like fact extraction and an improved Wikipedia search. Currently, this tool is in the research prototype stage. With your support, we aim to develop a refined version that will be freely available to the broader community.

Stanford OVAL Wikipedia Inconsistency Detection Tool Prototype
Stanford OVAL Wikipedia Inconsistency Detection Tool Prototype

About Stanford OVAL

[edit]

Stanford OVAL is a research lab at Stanford University. Its previous research projects include Storm and WikiChat. You can learn more at Stanford OVAL — Projects.

Project Information

[edit]

What are Factual Inconsistencies?

[edit]

Factual inconsistencies in Wikipedia articles occur when information conflicts between different pages. This can happen when different editors work independently on related topics, or when one article is updated with new information while another isn't. Identifying these inconsistencies is the first step toward resolving them.

The inconsistencies can range from obvious contradictions to subtle ones that require logical inference to uncover. Here are some examples:

Caption text
Article 1 Article 2 Explanation
Domain name

Top-level-domain

Direct contradiction!

One article claims DNS was divided into two groups, while another claims it was divided into three.

Arnold I, Lord of Egmond

Leiningen family

Requires reasoning.

Though not immediately obvious, Jolanthe of Leiningen (article 1) and Yolantha (article 2) appear to be the same person. Both share a mother (Jolanthe of Jülich) and death date. Therefore, the articles conflict on the name of the father (Frederick VII or VIII).

Codeine

Equianalgesic

Needs calculation.

The equianalgesic chart shows equivalent doses of pain medications. There's an inconsistency in codeine conversion: one article suggests ~66.67 mg of codeine PO (using the formula (200 mg of codeine PO) / (30 mg of morphine PO) * (10 mg of morphine PO)) as equivalent to 10 mg of oral morphine, while another suggests 100-120 mg of codeine PO for the same oral morphine dose. Note that PO means the medication is taken orally.


How Does Our Tool Work?

[edit]

Let’s say we want to check if the article “Domain name - Wikipedia” contains any facts that are inconsistent with other articles on English Wikipedia. Here’s how our tool works: first, it uses a large language model (LLM) to break down the “Domain name” article into short statements, for example, “The domain name space was divided into two main groups of domains.” and tens of other statements.

The tool then provides a search feature and the needed background information about the topic to look for inconsistencies across the entire English Wikipedia. Some of the features make use of LLMs, with the goal of enabling each editor to find more inconsistencies with less time and effort.

Stanford OVAL Wikipedia Inconsistency Detection Task
Stanford OVAL Wikipedia Inconsistency Detection Task

Timeline

[edit]

We aim to begin working with editors before the end of January.

+++ Interested? Please fill out this form! +++