Grants:IEG/Accuracy Review of Wikipedias
Project idea
[edit]What is the problem you're trying to solve?
[edit]Create a medawiki-utilities bot to find articles in given categories, category trees, and lists. For each such article, find passages with (1) facts and statistics which are likely to have become out of date and have not been updated in a given number of years, and optionally (2) phrases which are likely unclear. Add an indication of the location and the text of those passages either to the page in question using templates, to a bookkeeping page with other page names as headings, and/or to a database local to the bot.
What is your solution?
[edit]Use a customizable array of keywords and regular expressions and measures of text comprehensibility (or optionally, the DELPH-IN LOGIN parser [ http://erg.delph-in.net/logon ]) to find such passages for review. Use an algorithm at least as good as that in T89763#1066043 to pre-compute the age of each word in an article (to avoid the move and blanking issues described in e.g., http://wikitrust.soe.ucsc.edu/talks-and-papers ) before processing each article of interest.
Present flagged passages to one or more subscribed reviewers. Update the source template, if any, with the reviewer(s)' response, but keep the original text as part of the template. When reviewers disagree, update the template, if any, to reflect that fact, and present the question to a third reviewer to break the tie.
Project goals
[edit]This project aims at keeping a track of the outdated facts and statistics. At the end of the project we will be able to add an indication of the location and the text of the outdated data either to the page in question using templates, to a bookkeeping page with other page names as headings, and/or to a database local to the bot. The details about the project and the applicant is available at this link :https://etherpad.wikimedia.org/p/accuracyreview
Project plan
[edit]Activities
[edit]1.Subtasks: Document updated schema, Improve candidate passage queue management, Improve reviewer workflow, Improve reviewer reputation database and reporting, Include double-blinded identity and action codings for reviewer reputation database.
2. Answer whether or not you plan to apply for a https://meta.wikimedia.org/wiki/Grants:IEG#ieg-applying grant:
Yes, I have submitted an application for IEG for this project.
3. Name and email:
Shrutika Gulati shrutikagulati@gmail.com
4. Briefly describe your work style: how you plan to communicate progress, where you plan to publish your source code while you're working, how and where you plan to ask for help.
I have a very organized way of doing my work, I will divide my complete project into tasks and subtasks each of which have a maximum duration of a week. I will devote half of my time in investigation & coding and the other half in deploying, testing and documentation.
I will communicate to my mentor on week basis providing them with a report of the work done in a week duration and the documentation of the same. According to me, the mailing list and IRC channels are the best options for asking help.
5. Please describe your experience with any other FOSS projects as a user and as a contributor.
I have been a constant user of various Wikimedia projects like Wikipedia, Wikiquote, Wiktionary and many more and it has been a great experience because of the accuracy and efficiency of the work done.
6. What project(s) are you interested in (these can be in the same or different organizations)?
I want to contribute to "Accuracy Review of Wikimedia" as a part of IEG and I am also looking for a project under Outreach Program.
7. Do you have any past experience working in open source projects (MediaWiki or otherwise)?
I have not done any professional Open Source Project but I do have used many Open Source Projects on my local machine and customized them according to my need.
8. Education completed or in progress:
I am a fourth year student currently pursuing my Integrated B.Tech+M.Tech degree in Computer Science Engineering.
9. How did you hear about this program?
I have keen interest in Open Source Projects. So I keep looking for them. I was initially aware of GSOC but after further browsing I got to know about Outreach Program and IEG.
Budget
[edit]The total budget as mentioned in the Probox is 9000 USD. The division of the funds will be done simultaneously as the work on this project proceeds.
Community engagement
[edit]Mentors: James Salsman, Fabian and Maribel will guide me throughout the project duration.
Sustainability
[edit]At the end of the grand I expect a new future scope to this project and I wish to contribute in the growth of the project as far as possible.
Measures of success
[edit]If we are successful at keeping the track of outdated facts and statistics and we are able to complete these subtasks : Document updated schema, Improve candidate passage queue management, Improve reviewer workflow, Improve reviewer reputation database and reporting, Include double-blinded identity and action codings for reviewer reputation database. Then the project will be considered successful.
Get involved
[edit]Participants
[edit]Mentors: James Salsman, Fabian and Maribel Applicant: Shrutika Gulati I am currently working on Natural Language Processing in Python, hopefully that will be a major help contribute to making this idea a success.
Community notification
[edit]Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?
Endorsements
[edit]Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page). Yes, I think this project should be selected for IEG because no one wants to gather outdated information from wiki pages and this project will help wiki maintain its accuracy.