Talk:Language proposal policy/2018

Please do not post any new comments on this page. This is a discussion archive first created in 2018, although the comments contained were likely posted before and after this date. See current discussion.

Better version of code requirement

Latest comment: 6 years ago4 comments4 people in discussion

2. The language must have a valid ISO 639 (-1, -2, -3, -6), BCP 47, or LinguistList code.

If there is no valid language code, you must obtain one. The Wikimedia Foundation does not seek to develop new linguistic entities; there must be an extensive body of works in that language or dialect. The information that distinguishes this language or dialect from another must be sufficient to convince SIL, IETF, or Linguist List to create a code.

I feel that this allows more languages to be added. Lojbanist (talk) 23:16, 31 March 2018 (UTC)

I see you've renamed your account. Once again, there's absolutely no evidence that ISO 639-3 is not sufficient for Wikimedia's purposes; there have been but few languages really eligible but for an ISO 639-3 code, and those have managed to gain ISO 639-3 codes. This should be driven by at least one and preferably multiple projects whose main problem is the lack of an ISO 639-3 code.--Prosfilaes (talk) 07:27, 1 April 2018 (UTC)

I think you will find LangCom opposed to this idea. Consider the following, with respect to languages for which there is not a valid ISO 639–3 code:

All ISO 639–1 codes have corresponding ISO 639–2 codes.

@StevenJ81:

Oppose The Serbo-Croatian has a deprecated ISO 639-1 sh, and an ISO 639-3 code hbs, but it really doesn't have ISO 639-2 code. --Liuxinyu970226 (talk) 15:26, 16 January 2019 (UTC)

All ISO 639–2 codes have corresponding ISO 639-3 codes, with the exception of those marked as collections.
By definition, "collections" are not languages, and not even macrolanguages. They are collections of languages. As such, they are not eligible for projects in their own right. Their constituent members certainly are. (There are some active projects with collection codes, but they are grandfathered. At some point we will try to move those to different language codes.)

Some of the resulting ISO 639–3 codes become macrolanguages. That subject is a complex one that I will not elaborate on here. But either the macrolanguage itself or its constituents will always be eligible.
ISO 639–6 was withdrawn, so it really has no status. Moreover, certain 639–6 codes represented language variants that LangCom and WMF most assuredly do not want to allow routinely as separate projects, such as script variants and historic variants.
Disclaimer: We would probably use the ISO 639–6 code for Wawa if a project were ever created. But that's because its ISO 639–3 code is www, which would cause no end of problems to implement.
Under certain circumstances, LangCom is (in theory) willing to entertain a project for a language having no ISO 639–3 code, but having a BCP 47 code. See Language committee/Voting policy. Such languages would require a 2/3 affirmative vote to be allowed, and in practice such languages will not get the 2/3 vote unless they first try to get an ISO 639–3 code but fail.
Linguist List doesn't assume that everything that it tracks is a language.

StevenJ81 (talk) 21:52, 4 April 2018 (UTC)

Flaws of ISO-639-3

Latest comment: 6 years ago3 comments2 people in discussion

The three-letter codes themselves are problematic, because while officially arbitrary technical labels, they are often derived from mnemonic abbreviations for language names, some of which are pejorative. For example, Yemsa was assigned the code [jnj], from pejorative "Janejero". These codes may thus be considered offensive by native speakers, but codes in the standard, once assigned, cannot be changed.
The administration of the standard is problematic because SIL is a Christian missionary organization with inadequate transparency and accountability. Decisions as to which proposals for new language codes should be approved are made internally at SIL, without very little outside input. LinguistList, the only people who SIL runs proposals by, have created their own extended version of ISO-639-3, and they're a public mailinglist instead of a secretive group trying to convert isolated tribes to Christianity, so why can't we use a combination of ISO-639, LinguistList, and IETF codes instead of just the ISO codes?
Languages and dialects often cannot be rigorously distinguished.

Lojbanist (talk) 01:08, 24 June 2018 (UTC)

Again? You've beat this drum before, and nothing has changed. Wikimedia has ignored ISO 639-3 a couple times, like nds-nl and nds-de, and merging dialects, but in general, ISO 639-3 has worked fine. People can ask for any language they want, and if some language not encoded in ISO 639-3, that SIL refuses to encode in ISO 639-3, is nonetheless deemed eligible, then an IETF code or local code will be found for it. Likewise, if there's an actual community that wants a Wikimedia project but finds the ISO 639-3 code offensive, adding IETF and LinguistList codes won't help anything, and I'm sure Wikimedia can figure out an alternate code.

There's no actual problem; most of the time ISO 639-3 works just fine, and there's no alternatives that could replace it and work better.--Prosfilaes (talk) 07:04, 27 June 2018 (UTC)

I'm a member of the IETF-languages list, and it doesn't encode languages in practice; once Michael Everson (the maintainer) threatened to, in order to get SIL to encode it in ISO 639-3, but ISO 639-3 did in fact encode that lect of Sweden as a separate language. In practice, IETF-languages encodes dialects and writing standards, not anything Wikimedia needs to worry about.--Prosfilaes (talk) 07:04, 27 June 2018 (UTC)