Jump to content

User:Masoris/Edit Page Suggestion for Automatic Conversion System

From Meta, a Wikimedia project coordination wiki
  • I’m so sorry about my terrible English. Please help me with my English.

This page describe how to allow users to edit pages with the source shown in their respective preferred writing systems. This is just my suggestion and I want to talk about this in discussion page. And everyone can and welcome to edit, add opinion and correct English... to this page.

These days, In Chinese[1], Serbian and Kazakh language Wikipedia accept multi-writing system by automatic conversion system.[2] That is very useful system for Wikipedian who can’t read or write other writing system. But they have some problem to use Wikipedia; one of those is Edit Page. The edit page is NOT converted and the page is mixture of many writing system. So Wikipedian who edit to these page, may be hard, uncomfortable and ineffective to contribute in Wikipedia.[3] If they can edit the page in their writing system by support in Wikipedia. It will be very nice and effective thing.


Structure

[edit]

MediaWiki already supports Automatic conversion between simplified and traditional Chinese, which is based on word and character correspondence lookup. We want to extend this system to allow it to convert the source too. Since we are still using this system, I think we should still let the server send the converted data.

Backend

[edit]

We want some separation between small units of word/char correspondence. For ease of discussion, let's define a new type of tag <edit_acs data-orig="original text">transformed display</edit_acs>, where each unit of text is preserved. For example, the string 電腦新聞 can be divided into three units for conversion, 電腦, , and . The resulting string sent by the server will be something like <edit_acs data-orig="電腦">计算机</edit_acs>新<edit_acs data-orig="聞">闻</edit_acs>, with a tag assigned for each unit. If the post-transform text is the same as the pre-transformed text, the tag is not added.

Edit box (presentation)

[edit]

The transformed text, or the hypothetical edit_acs tag, should be easily seen. For example, we can use blue text in the edit box. An optional caption which shows the original text can be added by the use of the title= attribute. We follow the same formatting convention in this page, with the exception that we will be using <font color=blue title=foo>某</font> instead of the made-up tag. If you use Japanese, you will find this system similar to ruby tagging, but displayed in a rather different way.

For a Chinese user using zh-Hans-CN (Mainland China) editing some zh-Hant-TW page source string 中國電腦新聞動態, the conversion table would be:

Conversion Table
TW CN
電腦 计算机

So the resulting text would be 计算机.

In Chinese such char-by-char approach is necessary as the language doesn't use spaces for word boundaries. But for other languages it might be good to use spaces as boundaries, as words with inappropriately mixed scripts (e.g. 動态) can be harmful to future auto-conversion.

Note that the implementation must be copy/paste-safe, which means that the original text must be kept during this procedure. It will even be better if copying the whole tagged blue group is possible. A Wiki-Ed like custom edit area is necessary since rich text presentation is introduced.

العربية

Korean Example in ko:부산광역시

[edit]

Here is more example in Korean. First image is korean with hanja(unconverted). And Second is korean hangul only(converted). [4]


Korean Example 2 in ko:세종대왕

[edit]
Unconverted Edit Page
[edit]
'''世宗大王'''(-{世宗大王}-, [[1397年]] [[陰歷 4月 10日]]/陽歷 [[5月 6日]] - [[1450年]] [[陰曆 4月 8日]]/陽曆 [[5月 18日]], 在位 [[1418年]] - [[1450年]])은 [[朝鮮]]의 第4代 [[왕|임금]]이다. [[諱]]는 祹, [[자 (이름)|字]]는 元正, [[諡號]]는 世宗莊憲英文睿武仁聖明孝大王이다. [[조선 태종|太宗]]과 [[元敬王后]]의 셋째 아들이다.

世宗大王은 在位期間 동안 國防과 科學 및 經濟, 文化 등 全 分野에 걸쳐 찬란한 業績을 많이 남겨偉大한 聖君으로 推仰받고 있다. 現在 [[大韓民國 貨幣]]의 最高額 [[貨幣]]인 10000원권의 人物이기도 하다.
Converted Edit Page
[edit]
'''세종대왕'''(-{世宗大王}-, [[1397]] [[음력 4 10]]/양력 [[5 6]] - [[1450]] [[음력 4 8]]/양력 [[5 18]], 재위 [[1418]] - [[1450]])은 [[조선]]의 4 [[왕|임금]]이다. [[]]는 , [[자 (이름)|]]는 원정, [[시호]]는 세종장헌영문예무인성명효대왕이다. [[조선 태종|태종]]과 [[원경왕후]]의 셋째 아들이다.

세종대왕재위기간 동안 국방과학경제, 문화 등 전 분야에 걸쳐 찬란한 업적을 많이 남겨위대성군으로 추앙받고 있다. 현재 [[대한민국 화폐]]의 최고액 [[화폐]]인 10000원권의 인물이기도 하다.

[5]

In Editing

[edit]

When a ruby-group is changed, the ruby-fication is broken, and the style, as well as the orignal-text data, is removed. For example, given an rubyed word 计算机 (put your mouse over the text to see what it is in original data), in a single group. When someone puts an edit inside, e.g. removes the character 机, the ruby group is broken, and the text becomes an ordinary string 计算.

Broken ruby groups MUST not be recovered by ordinary re-typing. Starting with , one deletes the last character and gets . Re-typing the character will give , because the ruby info is already lost. Note in this example, the four characters are assigned single-character ruby-groups, unlike the 计算机 example where the whole word gets a conversion.

Save Page

[edit]

On saving the page, the client filters all the ruby tags and transforms them to the "original text" property stored for each block. The Ruby data is for presentation only, and should not be sent. The black (i.e. normal) text is sent as-is.

Starting from our previous example, the client will filter the text into 新聞動态, and send this to the server.

Exception

[edit]

Manual Markup

[edit]

Manual Markup should not be use ruby. Because, no one want to change this in Edit box.

For example :

  • -{zh-cn: 计算机; zh-tw: 電腦}- => -{zh-cn: 计算机; zh-tw: 電腦}- ( not -{zh-cn: 计算机; zh-tw: 电脑}- )

Functions

[edit]

It needs a bit of programming for this suggestion.

Server Side

[edit]

Server needs to be added these functions:

  1. Send data with rubys.

Client Side

[edit]

Client(edit box) needs to be added these functions:

  1. Need to handling characters with rubys.
  2. Color characters which has ruby.
  3. Copy and Paste with rubys.
  4. Send just original data when save.

Notes

[edit]