Jump to content

Article validation

From Meta, a Wikimedia project coordination wiki

Article validation(AV) in general refers to proposed mechanical systems by which editors may be assisted in exerting correctional editorial decisions upon articles, while keeping any current open or "democratic" creation and editing of articles.

Currently the only mechanism for "validation" is the open wiki model, which allows users to alter, adapt, reform, or revert changes made during a previous edit.

The wiki model is synonymous with openness --ie. one wherin corrective measures are post-active (done after the fact), and while the wiki model has led to a great deal of success in building an encyclopedia from scratch, the growth of the project has exceeded the size of its core editorship, and in controversial areas has been seen as an obstacle to producing a "stable" or "professional" encyclopedia.

In essence, AV is a plan to move toward implementing a functional mechanism which would change the modus operandi of Wikipedia away from the idealistic concept of the completely open model, and give an important tool by which trusted editors can be mechanically assisted in repelling vandalism. Because such a change could theoretically have an impact upon the very philosophy by which Wikipedia has thus far succeeded, it is necessary that such mechanism be well thought out, openly corrected, and thoroughly explained.

This feature is in MediaWiki 1.5 and is expected to be live on en: Wikipedia in 1.5. See also Article validation possible problems.

Current situation

Only users approved by the board can get accounts. See Request for an account on the Foundation wiki. Unapproved or draft edits are stored on Meta.

Other wikis

Other wikis do not have approval systems that affect editing or the appearance of versions, but do have some quality control measures (peer review, featured articles, lists of stubs/substubs, etc.).

Some of these are linked to, among other things, from Wikimedia projects.

The Foundation wiki allows full use of HTML. The other wikis allow limited use of HTML, as the MediaWiki software normally does.

General issues

Applications

Basically, article validation can be used for three things:

  1. the Foundation wiki
  2. the other websites
  3. static publication (e.g. paper, CDs/DVDs, WikiReaders)

Similar or different validation systems could be used for these.

Foundation wiki

The Foundation wiki works differently because:

  • It allows full use of HTML
  • Everything on that wiki is official. --Daniel Mayer 16:28, 17 Feb 2005 (UTC)
    But some things on other wikis are official and they are not protected. Brianjd 03:11, 2005 Feb 23 (UTC)
    Actually, I'm not sure that everything on that wiki is official. Are the discussion pages official? Brianjd 02:57, 2005 Feb 24 (UTC)
    You're confusing the issue. The Foundation is bound by statements made on the Foundation wiki. Users here are not bound by statements made on Wiki. (With the exception of policy pages and so forth.) Without clearly stating what you mean by "official", this debate is meaningless. Grendelkhan 20:20, 12 Apr 2005 (UTC)
Why Meta should not be used for unapproved or draft edits
  • There are then two versions of each page:
    • Both versions have to be checked to get the latest version.
    • The versions have to be synchronised regularly.
    • The page history is in two places.
  • The Meta version may not display properly, because:
    • it cannot use HTML in the same way (see above), and
    • it does not have all the templates.
  • It spams recent changes.

Static content

We have started producing static contents (cf. our WikiReaders). When static content is distributed (CD-ROM, paper, PDF), an error displayed in the document can't be corrected by the reader, and will stay visible...forever... It may be bad for Wikimedia's public image, and may cast doubts upon the validity of all of our content.

This seems to indicate that any content officially distributed by Wikimedia should be carefully peer reviewed, or have a prominent notice indicating the nature of the content (accepting contributions from anyone).

On the other hand, other static content contains errors, and the reputations of the publishers don't seem to be tarnished, normally.

HTML

Do we want to restrict the use of HTML by default? See discussion on English Wikipedia. Brianjd 03:25, 2005 Mar 6 (UTC)

Vandalism

  • Some readers are worried about the reliability of our content, which is at times distorted by vandals.
It seems to me that is a different issue than reliability. One is crime, the other is accuracy or efficacy.
  • Better peer review might increase our audience and improve our reputation.
  • Article validation might help prevent vandalism from being seen and indicate whether an article's content is vandalism or not.
Or it might not. Isn't there a big difference between validation and removal of vandalism? The vast majority of vandalism is very obvious and any editor can handle it.
  • Article validation might be seen as a challenge and therefore increase vandalism.

Base of editors

  • Some potential expert editors refuse to edit, because they think their content will be damaged by vandals or non-experts. Providing a checking service might help them feel more confident with the process.
  • Some current editors may choose not to participate if the process becomes more complicated.
  • On the well-developed wikis, there is an increasing feeling of protectiveness in some areas. A set of editors sometimes forms a protection team, aggressively undoing or edit warring over any changes they dislike, thereby preventing the natural evolution of the wiki process. Providing them with the certainty that at least one approved version is available might help them be more gentle with newbies, or with editors holding a different bias than them.
    Huh? Any examples? Why are these users not talked to, or if they don't respond to that, banned? Brianjd | Why restrict HTML? | 08:33, 2005 Mar 20 (UTC)

Versions

The Cathedral and the Bazaar is completely irrelevant to Wikipedia. An open-source program typically has one or few maintainers who accept good contributions, reject bad contributions, release stable versions, and who don't have to fork the entire universe for their variant of an individual program to become widely used. If open-source software was "wiki-like", it would be a disaster. -- Mpt, 2005-12-18
  • All versions are currently non-editable and non-deletable. It is only possible to create new versions.
  • It is not necessary to hide the most-recent version from viewers.

Advantages

The advantages of an approval mechanism of the sort described are clear and numerous:

  • We will encourage the creation of really good content.
  • It makes it easier to collect the best articles on a wiki and create completed "snapshots" of them that could be printed and distributed, for example. This issue is central to Pushing to 1.0.

Disadvantages

  • People may not want to make small contributions if they think well-written articles, with references, are expected.
  • An expert-centered approval mechanism might be considered a hierarchic methodology, in contrast with the bazaar-type open-source projects like Wikipedia, that are known to achieve good results (e.g. Linux) through aggressive peer-review, and openness (With enough eyeballs, all errors are shallow.). It can be argued that the very reason Linux has become so reliable is the radical acceptance, and for some degree, respect, for amateurs' and enthusiasts' work of all sorts. (Note: This argument is currently somewhat backwards - code does not get into the Linux core tree unless the patch is approved by Linus Torvalds, and Linus is only likely to accept a patch if it has been pre-approved by the relevant subsystem maintainers. All patches must actually function as advertised, and are also reviewed for compliance with coding standards. Linux vendors apply their own quality control to the patches they include in their distributed kernels. If Wikipedia wants to model itself on Linux, it needs to be understood that Linux development is not a free-for-all - potential contributions are scrutinised carefully before being accepted)
  • Experts have controversies among themselves: for example, many subjects in medicine and psychology are highly debated. By giving a professor the free hand in deciding whether an article is "approved" or "non-approved" there is a risk of compromising the NPOV standards by experts' over-emphasizing their specific opinions and area of research.
  • Finding an expert who corresponds to a certain article can sometimes be troublesome:
    • Can a Ph.D on applied Mathematics "approve" articles on pure mathematics?, or more strictly, should one be accepted as a approver only if he/she have made research on the specific subject he/she is approving? Who will decide whether a person is qualified for approval?
    • Some obscure or day-to-day topics don't have any immediate "expert" attached to them. Who will approve articles on hobbies, games, local cultures etc.?
  • Validation implies the necessity of indication of acceptance of an article. The opposite may be more inline with the current wikipedia strategy- that of rejection of versions that are not acceptable. In most cases rejection does not require experts at all.
  • Experts are inevitably rare. That means the pool of people suitable to validate articles is small, and this is a scalability issue. The wikipedia is huge, so scalability is very important. In other words, there's no point in having a validation mechanism if nothing ever gets validated (see Nupedia).

Is it needed?

These are arguments presented for why an additional approval mechanism is unnecessary for Wikimedia Foundation wikis:

  • They already have an approval mechanism; although anyone can edit any page and experts and amateurs of all levels can be bold and contribute to articles, communal scrutiny is an approval mechanism, though not as centrally-controlled as peer review.
  • Imperfect as they currently stand, they already foster a more sophisticated critical approach to all sources of information than simple reliance on "peer-reviewed" authority. Low-quality articles can be easily recognized by a reader with some or no experience in reading wikipedia, and by applying some basic critical thinking:
    • The style may sound biased, emotional, poorly written, or just unintelligible.
    • Blanket statements, no citing, speculative assertions: any critical person will be careful in giving too much credit for such article.
    • The history of an article shows much of the effort and review that has been brought into writing it, who and how qualified are the writers (Users seem to put some biographical information about themselves on their pages.).
    • A sophisticated reader soon learns that the Talk page is often enlightening as to the processes that have resulted in the current entry text.
  • Cross-checking with other sources is an extremely important principle for good information gathering on the internet! No source should be taken as 100% reliable.
  • Some "authoritative" and "approved" encyclopedias don't seem to stand for their own claims of credibility. See errors in Encyclopædia Britannica that have been corrected in Wikipedia.
  • The very idea of an article being "approved" is debatable, especially on controversial topics, and can be seen as an unreachable ideal by some.

Proposed basic requirements

Among the basic requirements of an approval mechanism would have to fulfill in order to be adequate are:

  • The approval must be done by experts about the material approved. A trust model?
  • There must be clear and reasonably stringent standards that the experts are expected to apply.
  • The mechanism itself must be genuinely easy for the experts to use or follow. Nupedia's experience seems to show that a convoluted approval procedure, while it might be rigorous, is too slow to be of practical use.
  • The approval mechanism must not impede the progress of Wikipedia in any way. It must not change the wiki process, but rather be an "add-on".
  • Must not be a bear to program, and it shouldn't require extra software or rely on browser-specific stuff like Java (or JavaScript) that some users won't have. A common browser platform standard must be specified for them, preferably low level but not so low level that the machines don't have CDs.
  • Must provide some way of verifying the expert's credentials—and optionally a way to verify that he or she approved the article, not an imposter.

Some results we might want from an approval system are that it:

  • makes it possible to broaden or narrow the selection of approvers (e.g., one person might only wish authors who have phd's, another would allow for anyone who has made an effort to approve any articles.),
  • provides a framework for allowing multiple sets of "approvals", allow not just one set of approvers or one approval rating but different sets to allow for different standards. (I.e. as with consumer products, different safety approvals, scuba diving certifications, etc.)

Here is another proposed system:

  • Add a "rate this change!" feature to the software. Anyone is allowed to rate any change on a scale of -3 to 3, with -3 meaning "ban this person", -1 meaning "small errors", 0 meaning "no opinion", +1 meaning "small improvement" and +3 meaning "this is a great contribution". Even a rating of 0 is useful, because it would mean the change was inspected and no egregious errors or sabotage was found.
  • The point is just to start collecting the data. After the data is collected, it should be straightforward to figure out a trust rating system based on the data, and probably any sensible method would work.

Definitions

approval mechanism
mechanism whereby articles are individually marked and displayed, somehow, as "approved"
facet of validity
one of the following - accurate (fact-checked), neutral (NPOV), well-written (grammar-checked, spell-checked, proper style, clear), and complete (coverage suitable to the subject's importance)
reviewed version
article version whose quality has been reviewed by at least one editor (in each facet)
validated version
reviewed version whose quality is validated, on the basis of its content and outstanding reviews, by a (qualified) user (while its aggregate review is not negative in any facet)
reviewed/validated article
article, at least one of whose versions has been reviewed/validated
public version
version of an article showed to users who are not logged in
The public version need not be any different than the most-recent-edited version. The software can refer back to the most-recent-validated version and color code the differences from it in the displayed version.
the quantity of it should not be measured - Burg.
really? It is important that our article on Geology both showcase the depth of our relevant content, and do justice to the importance of the broad subject. I agree that evaluations of completeness should not take place on the same linear scale as evaluations of correcness and quality of writing. +sj+
This appears to have been moved from Wikipedia (Geology). Brianjd 05:39, 2005 Feb 17 (UTC)

Who can Validate?

  • set up a validation committee (a group of trusted, interested wikipedians) - requires separate "validator access" for those who have been approved to serve on these validating committees?
  • any user or group of users satisfying some statistical requirement (combination of time since registration, # of article edits, # of total edits)
  • any user or group of users
    • Any user, but different users validations are worth different amounts, based on trust metrics. (I'm doing some more research into this now, and will post back if I come up with much.) Lbs6380 04:07, 14 Feb 2005 (UTC) --- As promised: my ideas. Feel free to copy them here and play around with them.
  • Administrators (who are trusted members of the community), and any user who have been marked as trustworthy by 3 administrators.
  • automatic validation:
    • after a certain # of (qualified?) users have edited the article
    • after a member of a certain group has edited the article
    • after the version has remained current for some time (therefore assuming the previous edit was legitimate and not vandalism)
  • separate validation restrictions for different kinds of validation (e.g., creating factual-validation committees for each major category, based on experience/expertise)

(Note that few of the above presently relate to the level of knowledge about the subject of a 'validator', merely of their longevity or ability in wiki editing. This may not be appropriate for some subjects --VampWillow 11:28, 5 Dec 2004 (UTC))

How to Validate?

  • Validation via extra metadata - let users add input about various facets of an article, and summarise that in terms of whether and how thoroughly the article has been validated.
  • Validation by committee - have an editorial board which selects topical committees which go around setting a single 'validated' flag on articles once they have reached maturity.
  • Weighting and averaging reviews - Require supermajorities among trusted reviewers for key aspects of articles; use a crude trust metric to decrease the leverage of mischievious reviewers.

Suggested validation processes

See Article validation proposals. Also, Flagged revisions on English Wikipedia.

Discussion of the proposals

See Article validation proposals talk page.

Other proposals

For the Wikipedia page on this topic, please see en:Wikipedia:WikiTrust and its talk page.

I started off writing authority metric as a section of this page, but it was getting a bit long, so I moved it to its own article. Please check it out, it's closely related to discussion above. -- Tim Starling 03:02, 17 Nov 2004 (UTC)

IMHO, validation is 90% social and 10% technical. The social (or, if you prefer, political) part is "whom do we trust--what people, or what institution comprised of people--to certify that an article (as of such-and-such a date and time) is accurate or complete or whatever?" The technical part is "how do we store those certifications in the database and communicate them to the users?" I humbly submit article endorsement as a way to deal with the technical part. Sethg 16:31, 18 Nov 2004 (UTC)

Why not use UCSC WikiTrust?? It's available now as an extension to MediaWiki and automagically color codes edits based on trust metrics.

Two kinds of proposals and their structural relationship

There are two basic, completely different types of "article validation" being proposed.

  1. Versions:Every article has one "best version", called "public version" in the definition section above. The contributors to that article in particular (and possibly viewers) vote on this.
  2. Articles: A subset of article are selected as "featured articles" or "reviewed articles" or "vetted articles". The entire community (or a voluntary subset thereof) votes on candidates.

These two different structures are not mutually exclusive. On the contrary, they work well together. They can be connected easily:

  • A subset of public versions are selected as "featured public version"s or "reviewed public version"s or "vetted public version"s. The entire community (or a voluntary subset thereof) votes on candidates.

The only ambigiuty is: after a public version of an article is selected as a featured/reviewed/vetted public version, when a new public version is selected by the contributors to that article, does that new public version become the new community-reviewed public version, or does it have to be renominated? I would think that it would be renominated.

That being said, the community selection of a subset of public versions to be vetted public versions does not require software implementation. It is already being done on en via "featured articles" just fine.

The selection of a public version from the article history (or some have proposed branching - which can be done in the vetted public version process) does require software implementation and there are a lot more ways to go about it.

Given the above, I think that both kinds should be implemented, but because a "public version"

  • necessarily precedes the vetted public version,
  • requires software implementation, and
  • is a more open question,

"public version"s should be the focus of these pages. Kevin Baastalk 23:56, 8 April 2006 (UTC)[reply]

Seconded except for needing software implementation. Identification of public versions is possible using talk pages and wikipedia namespace. - Samsara 09:51, 15 April 2006 (UTC)[reply]
Then let me say, rather, that with software implementation, it can be done with considerably less effort, and cannot be disrupted or circumvented. Kevin Baastalk 19:14, 16 April 2006 (UTC)[reply]