Lingua Libre/GSOC25
![]() | The following is a proposed Wikimedia document. References or links to this page should not describe it as supported, adopted, common, or effective.
The proposal is in development, it may still be very experimental, not working as currently described or intended, and could be possibly never finalized. IMPORTANT: these projects are not confirmed yet ; between 0 and 3 of them could be lead into the 2025's GSOC25 or Outreachy/Round 30. See also phab:T385383. |
Lingua Libre extension
[edit]![]() |
This section is currently a draft. You can improve it. |

In the field of Language diversity, Wikimedia Foundation and Wikimedia France have supported LinguaLibre.org, a single page VueJS application to rapidly record vocabularies of the world. Over 280 languages and 1.3 millions words have been audio recorded into Wikimedia sites through this open project.
Recent Django/Vuejs/MariaDB revamp of the core app broke meaningful adds-ons. Those front-end features should be rebuilt upon the new database :
- overall languages dashboard (legacy)
- versatile search page (legacy),
- statistics (legacy: Global stats, Languages, Speakers, Chronological)
- minimal bilingual dictionaries system, ideally with minimalist micro-learning feature (legacy: 1, 2).
This will likely imply expansions of Lingua Libre APIs as well.

- Tech stack: VueJS, Django (Python), NodeJS, MariaDB
- Size: 350 hours
- Difficulty: Intermediate
- Mentor(s): Yug, {TBA}
- Intern: {Username} TBA
- Phabricator task: TBA
- Relevant links: Repository (demo), Phabricator dashboard, Lingua Libre.
Lingua Libre IOT
[edit]![]() |
This section is currently a draft. You can improve it. |

See : https://hugolpz.github.io/NamesOfTheLand .
Lingua Libre provides pioneer digital material for locals and minorities. Following 2024's collaboration with Occitan Whistle public exhibit and the creation of an physical interactive map, we want to develop real life open source IOT valorisation of Lingua Libre linguistic data. Target reusers are cultural exhibits, municipal councils, local community, local wikimedians.
Technology | Item | Worts with internet | Allocated time |
---|---|---|---|
JS or VueJS, LeafletJS | Interactive map table | Yes | 2 weeks |
JS or VueJS | Interactive poster table | Yes | 2 weeks |
JS | QR code to webpages for area with internet access | Yes | 2 weeks |
Arduino Solar powered Screen ? |
IOT speaker box with preprogrammed content | Without | 6 weeks |
Those base demonstrators create material table-sized displays in local museums where visitors could press on villages, places, or objects names and hear the native language name for these items. A complementary idea would be a physical play boxes on mountain hike paths where the internet is not available. Visitor could read the minimal instructions, press the box, and hear the native language audio for something they see.
- Tech stack: Arduino (or equivalent), minimal web coding ability.
- Size: 350 hours
- Difficulty: Intermediate
- Mentor(s): Yug, {TBA}
- Intern: {Username} TBA
- Phabricator task: TBA
- Relevant links: TBA.
Spell4Wiki & Lingua Libre
[edit]![]() |
This section is currently a draft. You can improve it. |

Align Spell4Wiki and Lingua Libre, access Lingua Libre's item lists.
- Tech stack: Anroid SDK (or equivalent).
- Size: 350 hours
- Difficulty: Intermediate
- Mentor(s): TBA, {TBA}
- Intern: {Username} TBA
- Phabricator task: TBA
- Relevant links: TBA.
WikiSpeech & Lingua Libre integration
[edit]![]() |
This section is currently a draft. Cancelled. The WikiSpeech team confirmed their TTS project already has academic researchers on it with no clear need for an GSOC intern. |
![]() WikiSpeech aims to offer an hyper-multilingual, open source « Listen to this article » Text To Speech services to all Wikipedia projects.
WikiSpeech-Lingua Libre meetup[edit]
Summary: WikiSpeech TTS unique challenge is to provide a TTS service while keeping a wiki-like correction feedback channel, so key words in Wikipedia articles are read as accurately as possible and rapidly correctable. This in multiple languages. WikiSpeech TTS expertise is provided by Swedish machine learning (ML) research centers which do not need dedicated ML GSOC25 intern. Still, Yug reminds of its possibility if wanted (deadline: Mar. 24th, 2025). Providing training data : WikiSpeech also look to collect reading samples, which aligns with Lingua Libre recent sentences and texts recording capability and possibility to share predefined list (T313575). Needed features (?): Provide users noticing a mispronounced word with a preloaded Lingua Libre link (language + word : open the recorder, record, upload with correct tag), would help. Feature request can be submitted on phabricator. A WikiSpeech web developer can easily contribute on Lingua Libre repository (MariaDB, Django, VueJS). Lingua Libre lists generators could be opened via API or split into a common service. Other links |
Examples
[edit]Others
[edit]Title | Stack | Workload | Description | Members |
---|---|---|---|---|
Flex / FieldWorks (?) | C/C++/Django | ? | collaboration with leading lexicographic software to ease co-integration https://github.com/sillsdev/FieldWorks | ? |