Grants:Project/Automated Portuguese grammar checking edits via LanguageTool
Project idea
[edit]What is the problem you're trying to solve?
[edit]
Portuguese Wikipedia is a fast growing branch of the Wikipedia. Unfortunately, Portuguese Wikipedia has, currently, low textual quality in Portuguese, since the various variants have very different rules.
This reduces Portuguese Wikipedia prestige and visibility, and makes many editors feel disheartened by seeing their work mixed with edits with poor or inconsistent grammar.
Having an automated grammar checking, focused on rules that are accepted in all Portuguese variants will avoid time-consuming edits, will avoid needless discussions among Portuguese Wikipedia contributors, and possibly increase the number of contributor that remain active in the community.
What is your solution?
[edit]
I would like to reactivate LanguageTool WikiCheck function for all languages, develop a Portuguese chunker or a strong disambiguation mechanism for LanguageTool, so that it allows automatic corrections on large sets of Wikipedia pages. After that, in regular periods of time, the improved rule sets would be tested in a limited set of articles. If the number of valid correction is far greater that the number of false positives (wrong corrections) the edits will be pushed. This can be done recursively and increasing the number of perfected rules over time.
The process can include Wikipedia contributors, and development can be prioritized according to editors needs. A similar method has been tested in the past by Jaume Ortola with great success, and it is based on the groud work by Daniel Naber [FOSDEM 2014] How we found a million style and grammar errors in the English Wikipedia
The main difficulty with the Portuguese project is that the rules set and disambiguation need to be improved in order to make the automated edits effective.
I believe this is the best way to rapidly increase and maintain the quality of the entire Portuguese Wikipedia, while developing a tool that can be used in the future by the entire Portuguese Community, even outside Wikipedia.
Project goals
[edit]
Increase overall article quality by reducing grammar errors and making article style more uniform. Increase number of correction per unit of time. Secondarily, it will increase the rate of content improvement, since the editors will have more available time for content creation; may decrease contributor loss and may marginally increase Wikipedia prestige.
Project impact
[edit]How will you know if you have met your goals?
[edit]The metrics for correction will be obtained by LanguageTool reports. It will be possible to know the number of edits in test pages and in the submitted edits. The number of false positives will be analyze in the subset of test pages and can be extrapolated to the entire set of changes committed. The primary and secondary goals are achieved by continuous work and LanguageTool rules and interpretation mechanisms tuning. Secondary objectives, like article improvement rate increase can be obtained by comparing with previous years, with elementary statistical analysis. Contributor loss rate and satisfaction can be done via surveys to the most prolific editors and statistical analysis.
Do you have any goals around participation or content?
[edit]
General metrics may be too unrelated with this project at the moment. Hopefully, content pages created will increase but many other confusion variables can make the results non-significant in the short term.
Project plan
[edit]Activities
[edit]- Community organization, burocratic work, and project rules documentation (3 - 6 months);
- Full-time dedication to setting up a viable WikiCheck server for all languages (15 days - 1 month);
- Create all the necessary interfaces to make regular automated edits viable (15 days - 1 month);
- Continuously improve disambiguation and/or chunker in order to improve grammar replacements accuracy (entire period of the grant);
- Make a selection of the most accurate rules, so that it can start being used in the short amount ot time possible (entire period of the grant);
- Work with editors in order to allow continuous improvement of the grammar checking selection (the entire grant period);
- After initial verification and setup, make regular Wikipedia revisions, in order to continuously improve grammar.
Budget
[edit]- Community organization, burocratic work, and rule documentation - part-time work - up to six month, $ 2.000 per month
- WikiCheck reactivation and setup - full-time development work - up to one month, $ 3.500
- Creation or tuning of a batch editor - full-time development work - up to one month, $ 3.500
- LanguageTool continuous improvement, mostly on disambiguation and false positive reduction - during the entire period of the grant - $ 3.500 for each extra months
- Server and Internet costs - 12 months - up to $ 600 (maybe lower, depending on service quality and/or inexpensive alternatives)
- Taxes and legal costs - Up to $ 7.000 for 6 months
- Secondary goal:
- Maintaing the project for up to an year in part-time (1500 $ / month):
- Continue to work on LanguageTool rules and accuracy enhancing mechanisms;
- Continue to expand WikiCheck usage, so it covers more articles or other languages;
- Create and maintain strong bridges between the Portuguese Wikipedia community and LanguageTool development team.
- Maintaing the project for up to an year in part-time (1500 $ / month):
Community engagement
[edit]
LanguageTool is already the most important free and open-source general language grammar checker.
After the last 6 months of part-time work, it has become the most complete, freely available, style and grammar checker for the Portuguese language.
The previous version has already been highlighted in national magazines.
I intend to work with the community, and with its help its use will become even more widespread. The metrics that I have access to, already denote an increase of users in Portuguese speaking countries of up to 6 times, since 3.5 release, and a similar increase rate is expected in the near future.
Get involved
[edit]Participants
[edit]- Coordinator and Programmer: Tiago Santos (Tiago Santos): biologist (environmental sciences), biology, mathematics and english tutor, open-source project translator (GNOME, Launchpad, Transifex), developer (LibreOffice and LanguageTool)
Community notification
[edit]Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc. Need notification tips?
Endorsements
[edit]Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).