Jump to content

Community Wishlist/Wishes/Better article translation system

From Meta, a Wikimedia project coordination wiki
Better article translation system Submitted

Edit wish Discuss this wish

Description

  • When translating an English Wikipedia article to German with the Content Translation tool, the Content Translation tool can't machine translate it. Thus, the tool is useless for most cases and one is better to start a new article/draft without using that tool.
  • When using the MinT tool on testwiki one can use machine translation (Google Translate or a MinT) when translating ENWP -> DEWP with the Content Translation tool. Once finished one can copy the wikitext and use that for the new page. The problems with that:
    • one can't get the wikitext except if the wikitext is modified a lot but for long articles that can 1. be way too much 2. the two column Content Translation visual editor GUI is not good to use and I prefer/need a wikitext editor for proofreading and improvements 3. often articles are quite good and don't need that much postediting. This has been pointed out also by several other users e.g. here. Again, that translator GUI is not well suited for reading and improving the article – it needs the wikitext version.
    • one can't generate the translation in one go – one has to click one paragraph at a time. This has been pointed out e.g. here.
    • it translates the refs often in a weird way so for example this is how it translated a ref from EN to DE (it should be just the cite web template not the cite div next to it):
<ref name="SETI exoplanets">{{Cite web|url=http://news.discovery.com/space/seti-to-hunt-for-aliens-on-keplers-worlds.html|title=SETI to Hunt for Aliens on Kepler's Worlds|date=5 December 2011|last=Ian O'Neill|publisher=[[Discovery News]]|archive-date=30 August 2012|url-status=dead|archive-url=https://web.archive.org/web/20120830111432/http://news.discovery.com/space/seti-to-hunt-for-aliens-on-keplers-worlds.html}}<cite class="citation web cs1" data-ve-ignore="true" id="CITEREFIan_O'Neill2011">Ian O'Neill (5 December 2011). [https://web.archive.org/web/20120830111432/http://news.discovery.com/space/seti-to-hunt-for-aliens-on-keplers-worlds.html "SETI to Hunt for Aliens on Kepler's Worlds"]. [[Discovery (Unternehmen)|Discovery News]]. Archived from [http://news.discovery.com/space/seti-to-hunt-for-aliens-on-keplers-worlds.html the original] on 30 August 2012.</cite></ref>
In addition, it also adds named refs multiple times rather than just once and I think this wish is about that.
  • Note: the benefits of using this is that it understands the wikilinks so links to the proper page in the other Wikipedia so this may still be best for articles with lots of wikilinks.
  • When using DeepL I can only translate up to 5 k words a day which is often less than one article. When using Google Translate the problems are:
    • It can't deal with templates. So if one template exists in the Wikipedia the article is translated to, it doesn't add that template. This is a rather minor issue in comparison.
    • For wikilinks it doesn't know what the article title in the other language is (unlike MinT). So the page can end up with lots of redlinks where it should e.g. not wikilink those with no article in the target language and add the proper wikilink for those for which there is an article.
    • After every 5 k words, there are some issues with the translation since it doesn't slice it up so that the next content unit translated starts with a new sentence and the former ends with a sentence – it often ends in the middle of a sentence, causing mistranslation issues.
    • It translates the reference titles also. So all of the news article titles are also translated – one can't exclude the <ref>…</ref> so those stay entirely untranslated.
    • Individual references that are not defined inline must be inserted individually in the correct place.
    • In German Wikipedia all the English refs need to get parameter language=en added and that is usually all or nearly all refs when translating from EN to DE so a way to add that to all refs at once (one could check afterwards) or detect the language (e.g. via the ref title) is missing.
    • Main problem: It also translates some ref parameters which shouldn't be translated or some parameters (translated or not) don't exist in DEWP. Fixing all the refs (incl point above) actually takes up more time than proofreading and is so exhausting that one is not that motivated and able to afterwards also do all the proofreading. It was like that with an article with just 100 refs or less, now imagine one with 600. Also note the gap in articles compared to ENWP is in the millions. The issues with the references is the main problem.

So currently, it seems like the third option is usually the best way. And as far as I can see starting off with a translation from an already-existing comprehensive Wikipedia article (usually in ENWP) is the best way to time-efficiently create new articles in any Wikipedia other than English Wikipedia (rather than recreating from scratch what's already available).

Possibly a better approach – but these could also be complementary and it could take some while – is what's proposed at Community Wishlist/Wishes/Wikipedia Machine Translation Project which would be more efficient and take advantage of synergy and at-once corrections across many articles as well as mitigate the issue of articles in the target language becoming outdated while the source article continues to get improved.

Nevertheless, I think a good one-article-at-a-time machine translation system could be a great progress and is needed. I think it's now possible and at least would greatly improve conventional nonENWP article creation and translation work.

Assigned focus area

Unassigned.

Type of wish

Feature request

Wikipedia

Affected users

Wikipedia readers and editors of nonenglish Wikipedias

Other details

  • Created: 00:03, 23 April 2025 (UTC)
  • Last updated: 15:00, 27 April 2025 (UTC)
  • Author: Prototyperspective (talk)