Jump to content

Community Wishlist Survey 2021/Editing/Spellchecker

From Meta, a Wikimedia project coordination wiki

Spellchecker

  • Problem: One of the most important aspects copy-editing workflow for users is finding and fixing spelling mistakes and typos.
  • Who would benefit: Editors who would have less frustration in their work and readers who would read a higher quality articles.
  • Proposed solution: There is something in Persian Wikipedia which I would expect can be used as inspiration and turn into an extension. That tool is called Check Dictation. When an editor who enabled the gadget sees an articles, on top of the page, they see list of mistakes and inside the article they get color coded. It actually has different colors for different issues: Typos, bad wikitext, informal words, links to disambig pages, and many more types. Here's an example File:Rechtschreibung-fawiki.png. You can also define per-article list of okay words an example. The code for the gadget can be found in here but it's highly hard-coded to fawiki and it can be improved drastically.
  • More comments:
  • Phabricator tickets:
  • Proposer: Amir (talk) 18:31, 16 November 2020 (UTC)[reply]

Discussion

I think most operation systems and browsers support spellchecking on their site so this is not needed in MediaWiki. --GPSLeo (talk) 18:40, 16 November 2020 (UTC)[reply]

@GPSLeo You wouldn't see the typos unless you go to edit mode. How to find them in articles is not doable with browsers and operating systems. Amir (talk) 19:33, 16 November 2020 (UTC)[reply]
Ah, you want a tool to find mistakes in articles just while reading not for editing. I did not got this. Now I understand and think this could be useful. --GPSLeo (talk) 21:23, 16 November 2020 (UTC)[reply]
The Chrome spellcheck does not work for me when editing. Keepcalmandchill (talk) 03:45, 17 November 2020 (UTC)[reply]

There is a similar user script for the MOS called en:User:Ebrahames/Advisor.js on EN.WP. I don't think I've seen a spelling gadget. I also tend to disagree that a spelling gadget is necessary. (Mis)Spellings can be context dependent. --Izno (talk) 21:55, 16 November 2020 (UTC)[reply]

@Izno The spelling gadget would just highlight potential spelling mistakes. Even in the tool in fawiki, you can set highlights as false positive on per-article basis. Amir (talk) 03:33, 22 November 2020 (UTC)[reply]

I usually just use Grammarly to check grammar (not sponsored). Félix An (talk) 02:27, 17 November 2020 (UTC)[reply]

Would this also take regional variants of English into comparison? English Wikipedia articles can vary depending on regional relevance or by a "first-come first-serve" edit. Tenryuu (talk) 02:29, 17 November 2020 (UTC)[reply]

English is not the only language with spelling variances, so good question. --Izno (talk) 18:08, 17 November 2020 (UTC)[reply]

Note, that also in Wikisource are various variants of language, language of 100 years old work is different from todaylanguage, but it is also correct. THere should be some project-specific spellchecker, which allows local variants. JAn Dudík (talk) 14:09, 18 November 2020 (UTC)[reply]

I think the points made by other users about language variation are good, but as long as the changes are not automated and a human is always involved that person should be able to recognize when a word was incorrectly marked as a misspelling and not act to fix it. For languages that have detailed Wiktionaries, they might be a good source to use for checking what is and isn't a recognized spelling. This orange links gadget has functionalities that also might relevant to this proposal. —The Editor's Apprentice (talk) 19:21, 20 November 2020 (UTC)[reply]

@Ladsgroup: thanks for posting this. How does the Check Dictation tool work? Does it use some open-source Persian spellchecker? Or is it handmade with a list of common mispellings? I ask because the Growth team is building "structured tasks", which use machine learning to help newcomers find specific edits to make, e.g. adding wikilinks. Here are notes from a conversation about how to make spellchecking possible across languages, and we're thinking about whether it would have to be done language by language. -- MMiller (WMF) (talk) 17:19, 23 November 2020 (UTC)[reply]

@MMiller (WMF) The code for it is w:fa:مدیاویکی:Gadget-CheckDictation.js and it seems it calls a service in the cloud VPS (I didn't write this gadget so I'm not 100% sure of its internals) but I assume it uses a unix library for spellchecking. As I said, it has an exception list for each page as well [1]
The fun thing is that this was originally was developed to find spelling mistakes but it grew to basically any sort of copy-editing issues from links to disambig pages, to unclosed links/templates, to much more. Amir (talk) 00:28, 24 November 2020 (UTC)[reply]

I would support the idea, but in the context of a typographic checker, not just a spellchecker. It would check grammar, adjectives, orthography, etc. MarioSuperstar77 (talk) 21:06, 24 November 2020 (UTC)[reply]

  • I'm merging a similar wish:
    • Problem: عربى: وجود مدقق لغوي داخلي للنصوص شبيه بما يقوم به برنامج word
    • Proposer: عمر الشامي (talk) 21:09, 22 November 2020 (UTC)

SGrabarczuk (WMF) (talk) 20:25, 3 December 2020 (UTC)[reply]

A spellchecker and grammer-checker would be both be useful tools. They should be separate tools. The spellchecker should have the ability to set the English variety. I have encountered many articles that use several varieties and it would be useful tool to edit to the desired variety. User-duck (talk) 18:32, 8 December 2020 (UTC)[reply]

English Wikipedia already has an active spellchecking project that finds spelling errors and a small number of manual of style violations in the latest database dump - see en:Wikipedia:Typo_Team/moss. We're currently doing this by making wiki pages full of lists and relying on editors to go through the lists. It's taking years to get to all the likely typos, and though we're catching up, of course more are added all the time. Any UI that increases automation of this task, either by interested volunteers working from lists or by capturing work done by folks who just happened to be reading the article, would be very helpful. We are slowly starting to advertise problematic cases to readers using tags in the articles themselves (see en:Template:Typo help inline). This sort of tag could be a hook for a little interactive UI that resolves the spelling issue into a small number of bins (add to dictionary, proper noun, change to correct spelling, unsure). Or a reader-centric spell checker could find typos on its own without help from tags. Though there's something to be said for storing "not a typo" sorts of information in the article itself, so that if a different spelling or grammar checker comes by later, we won't duplicate work. As for dialect detection...many English Wikipedia articles also have templates declaring the preferred dialect, and in some cases the category membership associates an article with a specific country, too. But even without these things in most cases I think it's pretty easy to tell which dialect a page is mostly or completely written in. Wiktionary already knows which words go with which dialect, and we can simply count up the number that are unique to one or the other. Any reader's web browser's built-in spell checker is probably going to properly handle only their own dialect, and that's too cumbersome for most readers to change. (So it's helpful to build a new system that's smart enough to deal with multiple dialects.) -- Beland (talk) 08:37, 12 December 2020 (UTC)[reply]

Voting