Jump to content

Lingua Libre/GSOC25

From Meta, a Wikimedia project coordination wiki

Lingua Libre extension

This section is currently a draft. You can improve it.
Lingua Libre v.2.0, Homepage

In the field of Language diversity, Wikimedia Foundation and Wikimedia France have supported LinguaLibre.org, a single page VueJS application to rapidly record vocabularies of the world. Over 280 languages and 1.3 millions words have been audio recorded into Wikimedia sites through this open project.
Recent Django/Vuejs/MariaDB revamp of the core app broke meaningful adds-ons. Based on new database, revamp or creation of :

  1. the languages dashboard (legacy)
  2. the search page (legacy),
  3. statistics (legacy: Global stats, Languages, Speakers, Chronological)
  4. minimal bilingual dictionaries system is required.

This will likely imply expansions of the APIs.

Lingua Libre v.2.0, Recording Studio

WikiSpeech & Lingua Libre integration

This section is currently a draft. You can improve it.

WikiSpeech aims to offer an hyper-multilingual, open source « Listen to this article » Text To Speech services to all Wikipedia projects.
To do so, we want to create a solid pipeline using 1) Lingua Libre's audio sentences and textual datasets for their training data, 2) routines automation to retrain T2S ML models up to professional level, and 3) an online API service which, given an iso and text, would return the relevant audio reading stream. This online service would be open to all *.wikipedia.org queries, providing « Listen to this article » service to all Wikipedias readers.
This project would be supported by Wikimedia Sverige (WikiSpeech), Wikimedia France (Lingua Libre) and Google (GSOC25). You will collaborate with your mentor and Lingua Libre developers.

  • Tech stack: Python/Pytorch (or equivalent), Django/Vuejs, Makefiles or alternative.
  • Size: 350 hours
  • Difficulty: Intermediate
  • Mentor(s): Yug, {TBA}
  • Intern: {Username} TBA
  • Phabricator task: TBA
  • Relevant links: TBA.

Lingua Libre IOT

This section is currently a draft. You can improve it.
2024 collaboration with Occitan Whistle public exhibit lead to the developement of a prototype interactive map playing villages names using a local endangered language. We would like to create an open source toolkit for such displays and similar IRL systems.
See : https://hugolpz.github.io/NamesOfTheLand .

Lingua Libre provides pioneer digital material for locals and minorities. Following 2024's collaboration with Occitan Whistle public exhibit and the creation of an physical interactive map, we want to develop real life open source IOT valorisation of Lingua Libre linguistic data. Target reusers are cultural exhibits, municipal councils, local community, local wikimedians.

Technology Item Worts with internet Allocated time
JS or VueJS, LeafletJS Interactive map table Yes 2 weeks
JS or VueJS Interactive poster table Yes 2 weeks
JS QR code to webpages for area with internet access Yes 2 weeks
Solar powered
Screen ?
IOT speaker box with preprogrammed content Without 6 weeks

Those base demonstrators create material table-sized displays in local museums where visitors could press on villages, places, or objects names and hear the native language name for these items. A complementary idea would be a physical play boxes on mountain hike paths where the internet is not available. Visitor could read the minimal instructions, press the box, and hear the native language audio for something they see.

  • Tech stack: Arduino (or equivalent), minimal web coding ability.
  • Size: 350 hours
  • Difficulty: Intermediate
  • Mentor(s): Yug, {TBA}
  • Intern: {Username} TBA
  • Phabricator task: TBA
  • Relevant links: TBA.

Spell4Wiki & Lingua Libre

This section is currently a draft. You can improve it.
Logo of Spell4Wiki.

Align Spell4Wiki and Lingua Libre, access Lingua Libre's item lists.

  • Tech stack: Anroid SDK (or equivalent).
  • Size: 350 hours
  • Difficulty: Intermediate
  • Mentor(s): TBA, {TBA}
  • Intern: {Username} TBA
  • Phabricator task: TBA
  • Relevant links: TBA.


Title Stack Workload Description Members
Flex / FieldWorks (?) C/C++/Django ? collaboration with leading lexicographic software to ease co-integration https://github.com/sillsdev/FieldWorks ?