Jump to content

OpenSpeaks/Archives

From Meta, a Wikimedia project coordination wiki
Created
Collaborators
Ganesh Birua,
Gobardhan Panda,
Taukeer Alam,
Uday Raj Aaley
Duration:  2024-09 – 2024-12

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


This project aims to expand knowledge and citations about a few Indigenous and low-resourced tongues in Wikimedia projects. To do this, we will create accessible audio-visual media depicting oral knowledge in five tongues (Kusunda, Ho, Bonda, Van Gujjari, and Baleswari) with future scope of citing them. Supported by a Wikimedia grant, the project will engage native speakers and language experts to identify and subtitle archival recordings from five documentary films by Subhashish Panigrahi—all released under Creative Commons Licenses (CC BY and CC BY-SA). The project activities will use and improve OpenSpeaks to support language activists and archivists in documenting and disseminating low- and medium-resourced languages.

Background

[edit]

We identify that existing citation practices, including those within the Wikimedia movement, are deeply problematic and enshrined in post-colonial and other oppressive practices. While we contribute to challenging and changing such practices, their removal is slow and gradual. This project contributes in two aspects: increasing the scope of multimedia oral knowledge in Wikimedia projects, and collaborating with GLAM institutions to increase citations.

Prabhala et al. had previously argued that oral knowledge should be considered citations on Wikipedia. Euro-colonial settlers and dominant groups locally have systematically repressed and erased the knowledge(s) of several communities. Ironically, Wikipedia's stringent Western-centric citation policies adhere to the same dominant and majoritarian knowledge curation systems. This results in discouraging marginalised individuals, worsening existing on-wiki knowledge gaps. This project pilots a pathway for bringing oral knowledge onto Wikimedia projects and making them citable. Currently, oral knowledge is included within Wikipedia in a very limited scope, such as using speakers' voice recordings to illustrate their language, undermining its larger value.

We will engage GLAM institutions to acquire and catalogue the materials we will create. Expanding knowledge without citations is highly discouraged within the no original research criterion. This model aims to keep the uploaded media useful as media and citations even within the current citation policies.

Source documentary films

[edit]

Language-specific sub-projects

[edit]
Project Name Focus Language/Dialect Expert Name
गेम्येहाक़ गिपन (Gejmehac Gipan, literally "Kusunda language) Kusunda (Q33630) (ISO 639-3: kgg) Uday Raj Aaley
ମୁନାରେମ (Munaremo) Bonda (Q2567942) (bfw) and Desia (Q12629755) (dso) Gobardhan Panda (Q111077171)
मारी जबान मारी बिरसा (Maari Jaban Maari Birsa, lit. "our language our heritage) Gujari (Q3241731) (gju) Taukeer Alam (Q130314213)
𑢼𑣃𑣃𑣊 𑢾𑣁𑣌𑣁𑣖 (Ruum Sakam) Ho (Q33270) (hoc) Ganesh Birua (Q116546852)
ବାଲେସରିଆ (Balesoria) Baleswari Odia (Q4850727) (ory) Subhashish Panigrahi

Timeline

[edit]
  • September 2024: Identify and onboard language speakers, activists, or experts Done
  • October '24: Identify recordings best suitable for Wikipedia articles/Wikimedia projects Done
  • November '24: Subtitle selected recordings Done
  • December '24: Publish recordings on Wikimedia Commons and use in Wikipedia/other Wikimedia projects Doing...

Wikipedia and sister Wikimedia projects to be improved

[edit]
Phase 1
Create/improve Wikipedia articles and Wikidata entries on peoples (larger communities), languages and notable individuals[1]
Phase 2
Media uploading and subtitling

Planned activities:

  • Share unsubtitled and Wikimedia project-relevant media with language experts
  • Experts identify media and subtitle
  • Edit and clean up subtitles
  • Upload media with subtitles
Phase 3
Media inclusion in Wikipedia

Planned activities:

  • Identify WP articles
  • Insert media

Additional goals and activities

[edit]

GLAM collaborations

[edit]
1. Acquisition of materials as published literature

Discuss with GLAM institutions, especially libraries and language archives, for their acquisition and cataloguing of created media. These materials will create a pathway for improving Wikipedia citations. Wikipedia currently covers very little about Indigenous peoples, languages, and cultures. While we identify that Wikimedia movement's current citation gaps, this approach helps add content and citations) about peoples and cultures that are less represented on Wikimedia projects.

2. Creation of Library of Congress Subject Headings (LCSH)
Request the Library of Congress to create LCSH for subjects within the scope of this project that are missing relevant LCSHs.
3. Article contribution to Wikipedia/Wikimedia Incubators

Contribute media we create to create/improve Wikipedia articles and Wikimedia Incubator entries, thereby increasing their chances of becoming live Wikimedia projects.

Output

[edit]

Wikimedia project and community

[edit]

Commons

[edit]
Video Used in Notes
Ornithologist Taukeer Alam introducing himself in Van Gujjari
w:Van Gujjari
A Ho farmer sharing his farming experience
w:Ho people Included in 8 Wikipedias
Gobardhan Panda showing body parts and pronouncing their names in Bonda.
w:Bonda language New Wiktionary appendix created and being added to Wikipedia articles
  • Requested on Phabricator (task T381934) to include multiple language codes to Commons for them to appear in the list of languages while adding subtitles.

Other

[edit]
  • Upon our request, Participatory Culture Foundation added in Amara, an open source subtitling platform, Kusunda, Bonda/Remosam, Gujari, Sora and Garhwali to their list of languages, making it easier for all future translations from/to these languages.
  • Upon our request, Glottolog has accepted to include Van Gujjari, one of the focus languages, in the upcoming edition of Glottolog.

Call for community participation

[edit]
  • We helped co-design the Wiki Loves Languages edit-a-thon, led by Wikimedian Aliva Sahoo, which ran on the first pilot on Odia Wikipedia. XX editors uploaded XX articles. Many of our output media files were used in new and existing Wikipedia articles, created/edited during this edit-a-thon and beyond.

Open-source software and libraries

[edit]

This is a non-exclusive list of free and open source software and libraries extensively used for this project:

  • Exiftool for media conversion and metadata reading and writing
  • Mediainfo for metadata reading
  • Python
    • Python libraries:
      • Pydub for voice detection in subtitling
      • Whisper for text-to-speech

Notes

[edit]
  1. Some of the experts in this project are notable, lacking Wikipedia BLP (biographies of living persons) articles. We plan to volunteer some hours to create articles about them and use created relevant media in their articles.