Lingua Libre/GSOC25
![]() | The following is a proposed Wikimedia document. References or links to this page should not describe it as supported, adopted, common, or effective.
The proposal is in development, it may still be very experimental, not working as currently described or intended, and could be possibly never finalized. IMPORTANT: these projects are not confirmed yet ; between 0 and 3 of them could be lead into the 2025's GSOC25 or Outreachy/Round 30. See also phab:T385383. |
Lingua Libre extension
[edit]![]() |
This section is currently a draft. You can improve it. |
![](http://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Lingua_Libre_home_page_2020-12.png/280px-Lingua_Libre_home_page_2020-12.png)
In the field of Language diversity, Wikimedia Foundation and Wikimedia France have supported LinguaLibre.org, a single page VueJS application to rapidly record vocabularies of the world. Over 280 languages and 1.3 millions words have been audio recorded into Wikimedia sites through this open project.
Recent Django/Vuejs/MariaDB revamp of the core app broke meaningful adds-ons. Based on new database, revamp or creation of :
- the languages dashboard (legacy)
- the search page (legacy),
- statistics (legacy: Global stats, Languages, Speakers, Chronological)
- minimal bilingual dictionaries system is required.
This will likely imply expansions of the APIs.
![](http://upload.wikimedia.org/wikipedia/commons/thumb/3/33/Lingua_Libre_-_Record_Wizard_-_Studio.png/280px-Lingua_Libre_-_Record_Wizard_-_Studio.png)
- Tech stack: VueJS, Django (Python), NodeJS
- Size: 350 hours
- Difficulty: Intermediate
- Mentor(s): Yug, {TBA}
- Intern: {Username} TBA
- Phabricator task: TBA
- Relevant links: Repository (demo), Phabricator dashboard, Lingua Libre.
WikiSpeech & Lingua Libre integration
[edit]![]() |
This section is currently a draft. You can improve it. |
![](http://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/Wikispeech_logo.svg/280px-Wikispeech_logo.svg.png)
WikiSpeech aims to offer an hyper-multilingual, open source « Listen to this article » Text To Speech services to all Wikipedia projects.
To do so, we want to create a solid pipeline using 1) Lingua Libre's audio sentences and textual datasets for their training data, 2) routines automation to retrain T2S ML models up to professional level, and 3) an online API service which, given an iso and text, would return the relevant audio reading stream. This online service would be open to all *.wikipedia.org
queries, providing « Listen to this article » service to all Wikipedias readers.
This project would be supported by Wikimedia Sverige (WikiSpeech), Wikimedia France (Lingua Libre) and Google (GSOC25). You will collaborate with your mentor and Lingua Libre developers.
- Tech stack: Python/Pytorch (or equivalent), Django/Vuejs, Makefiles or alternative.
- Size: 350 hours
- Difficulty: Intermediate
- Mentor(s): Yug, {TBA}
- Intern: {Username} TBA
- Phabricator task: TBA
- Relevant links: TBA.
Lingua Libre IOT
[edit]![]() |
This section is currently a draft. You can improve it. |
![](http://upload.wikimedia.org/wikipedia/commons/thumb/c/cd/2024_Shiular_d%27Aas_exhibition%2C_Anglet%2C_France-04.jpg/220px-2024_Shiular_d%27Aas_exhibition%2C_Anglet%2C_France-04.jpg)
See : https://hugolpz.github.io/NamesOfTheLand .
Lingua Libre provides pioneer digital material for locals and minorities. Following 2024's collaboration with Occitan Whistle public exhibit and the creation of an physical interactive map, we want to develop real life open source IOT valorisation of Lingua Libre linguistic data. Target reusers are cultural exhibits, municipal councils, local community, local wikimedians.
Technology | Item | Worts with internet | Allocated time |
---|---|---|---|
JS or VueJS, LeafletJS | Interactive map table | Yes | 2 weeks |
JS or VueJS | Interactive poster table | Yes | 2 weeks |
JS | QR code to webpages for area with internet access | Yes | 2 weeks |
Arduino Solar powered Screen ? |
IOT speaker box with preprogrammed content | Without | 6 weeks |
Those base demonstrators create material table-sized displays in local museums where visitors could press on villages, places, or objects names and hear the native language name for these items. A complementary idea would be a physical play boxes on mountain hike paths where the internet is not available. Visitor could read the minimal instructions, press the box, and hear the native language audio for something they see.
- Tech stack: Arduino (or equivalent), minimal web coding ability.
- Size: 350 hours
- Difficulty: Intermediate
- Mentor(s): Yug, {TBA}
- Intern: {Username} TBA
- Phabricator task: TBA
- Relevant links: TBA.
Spell4Wiki & Lingua Libre
[edit]![]() |
This section is currently a draft. You can improve it. |
![](http://upload.wikimedia.org/wikipedia/commons/thumb/f/f1/Spell4Wiki.png/220px-Spell4Wiki.png)
Align Spell4Wiki and Lingua Libre, access Lingua Libre's item lists.
- Tech stack: Anroid SDK (or equivalent).
- Size: 350 hours
- Difficulty: Intermediate
- Mentor(s): TBA, {TBA}
- Intern: {Username} TBA
- Phabricator task: TBA
- Relevant links: TBA.
Others
[edit]Title | Stack | Workload | Description | Members |
---|---|---|---|---|
Flex / FieldWorks (?) | C/C++/Django | ? | collaboration with leading lexicographic software to ease co-integration https://github.com/sillsdev/FieldWorks | ? |