Lingua Libre/GSOC25

The following is a proposed Wikimedia document. References or links to this page should not describe it as supported, adopted, common, or effective.

The proposal is in development, it may still be very experimental, not working as currently described or intended, and could be possibly never finalized.

IMPORTANT: these projects are not confirmed yet ; between 0 and 3 of them could be lead into the 2025's GSOC25 or Outreachy/Round 30. See also phab:T385383.

Lingua Libre extension

This section is currently a draft. You can improve it.

In the field of Language diversity, Wikimedia Foundation and Wikimedia France have supported LinguaLibre.org, a single page VueJS application to rapidly record vocabularies of the world. Over 280 languages and 1.3 millions words have been audio recorded into Wikimedia sites through this open project.
Recent Django/Vuejs/MariaDB revamp of the core app broke meaningful adds-ons. Those front-end features should be rebuilt upon the new database :

overall languages dashboard (legacy)
versatile search page (legacy),
statistics (legacy: Global stats, Languages, Speakers, Chronological)
minimal bilingual dictionaries system, ideally with minimalist micro-learning feature (legacy: 1, 2).

This will likely imply expansions of Lingua Libre APIs as well.

Tech stack: VueJS, Django (Python), NodeJS, MariaDB
Size: 350 hours
Difficulty: Intermediate
Mentor(s): Yug, {TBA}
Intern: {Username} TBA
Phabricator task: TBA
Relevant links: Repository (demo), Phabricator dashboard, Lingua Libre.

Lingua Libre IOT

https://wiki.openstreetmap.org/wiki/Google_Summer_of_Code/2025/Project_ideas#Endangered_languages_toponyms_map_tool

This section is currently a draft. You can improve it.

Lingua Libre provides pioneer digital material for locals and minorities. Following 2024's collaboration with Occitan Whistle public exhibit and the creation of an physical interactive map, we want to develop real life open source IOT valorisation of Lingua Libre linguistic data. Target reusers are cultural exhibits, municipal councils, local community, local wikimedians.

Technology	Item	Worts with internet	Allocated time
JS or VueJS, LeafletJS	Interactive map table	Yes	2 weeks
JS or VueJS	Interactive poster table	Yes	2 weeks
JS	QR code to webpages for area with internet access	Yes	2 weeks
Arduino Solar powered Screen ?	IOT speaker box with preprogrammed content	Without	6 weeks

Those base demonstrators create material table-sized displays in local museums where visitors could press on villages, places, or objects names and hear the native language name for these items. A complementary idea would be a physical play boxes on mountain hike paths where the internet is not available. Visitor could read the minimal instructions, press the box, and hear the native language audio for something they see.

Tech stack: Arduino (or equivalent), minimal web coding ability.
Size: 350 hours
Difficulty: Intermediate
Mentor(s): Yug, {TBA}
Intern: {Username} TBA
Phabricator task: TBA
Relevant links: TBA.

Spell4Wiki & Lingua Libre

This section is currently a draft. You can improve it.

Align Spell4Wiki and Lingua Libre, access Lingua Libre's item lists.

Tech stack: Anroid SDK (or equivalent).
Size: 350 hours
Difficulty: Intermediate
Mentor(s): TBA, {TBA}
Intern: {Username} TBA
Phabricator task: TBA
Relevant links: TBA.

WikiSpeech & Lingua Libre integration

This section is currently a draft. Cancelled. The WikiSpeech team confirmed their TTS project already has academic researchers on it with no clear need for an GSOC intern.

WikiSpeech WikiSpeech aims to offer an hyper-multilingual, open source « Listen to this article » Text To Speech services to all Wikipedia projects. To do so, we want to create a solid pipeline using 1) Lingua Libre's audio sentences and textual datasets for their training data, 2) routines automation to retrain T2S ML models up to professional level, and 3) an online API service which, given an iso and text, would return the relevant audio reading stream. This online service would be open to all `.wikipedia.org` queries, providing « Listen to this article »* service to all Wikipedias readers. This project would be supported by Wikimedia Sverige (WikiSpeech), Wikimedia France (Lingua Libre) and Google (GSOC25). You will collaborate with your mentor and Lingua Libre developers. Tech stack: Python/Pytorch (or equivalent), Django/Vuejs, Makefiles or alternative. Size: 350 hours Difficulty: Intermediate Mentor(s): — (cancelled) Intern: — (cancelled) Phabricator task: — (cancelled) Relevant links: — (cancelled) WikiSpeech-Lingua Libre meetup [edit] Date: 2025.02.12 Participants: André, Sebastian, Yug Topic: « WikiSpeech-Lingua Libre meetup, exploring possible collaborations » Summary: WikiSpeech TTS unique challenge is to provide a TTS service while keeping a wiki-like correction feedback channel, so key words in Wikipedia articles are read as accurately as possible and rapidly correctable. This in multiple languages. WikiSpeech TTS expertise is provided by Swedish machine learning (ML) research centers which do not need dedicated ML GSOC25 intern. Still, Yug reminds of its possibility if wanted (deadline: Mar. 24th, 2025). Providing training data : WikiSpeech also look to collect reading samples, which aligns with Lingua Libre recent sentences and texts recording capability and possibility to share predefined list (T313575). Needed features (?): Provide users noticing a mispronounced word with a preloaded Lingua Libre link (language + word : open the recorder, record, upload with correct tag), would help. Feature request can be submitted on phabricator. A WikiSpeech web developer can easily contribute on Lingua Libre repository (MariaDB, Django, VueJS). Lingua Libre lists generators could be opened via API or split into a common service. Other links https://github.com/stts-se/wikispeech-manuscriptor https://github.com/shivammehta25/Matcha-TTS https://lingualibre.org/wiki/Help:Homographs#Homographs_non-homophones

Examples

Others

Title	Stack	Workload	Description	Members
Flex / FieldWorks (?)	C/C++/Django	?	collaboration with leading lexicographic software to ease co-integration https://github.com/sillsdev/FieldWorks	?

Lingua Libre extension

Lingua Libre IOT

Spell4Wiki & Lingua Libre

WikiSpeech & Lingua Libre integration

WikiSpeech-Lingua Libre meetup

Examples

Others