OpenSpeaks/Archives
Gobardhan Panda,
Taukeer Alam,
Uday Raj Aaley
This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.
This project aims to expand knowledge and citations about a few Indigenous and low-resourced tongues in Wikimedia projects. To do this, we will create accessible audio-visual media depicting oral knowledge in five tongues (Kusunda, Ho, Bonda, Van Gujjari, and Baleswari) with future scope of citing them. Supported by a Wikimedia grant, the project will engage native speakers and language experts to identify and subtitle archival recordings from five documentary films by Subhashish Panigrahi—all released under Creative Commons Licenses (CC BY and CC BY-SA). The project activities will use and improve OpenSpeaks to support language activists and archivists in documenting and disseminating low- and medium-resourced languages.
Background
[edit]We identify that existing citation practices, including those within the Wikimedia movement, are deeply problematic and enshrined in post-colonial and other oppressive practices. While we contribute to challenging and changing such practices, their removal is slow and gradual. This project contributes in two aspects: increasing the scope of multimedia oral knowledge in Wikimedia projects, and collaborating with GLAM institutions to increase citations.
Prabhala et al. had previously argued that oral knowledge should be considered citations on Wikipedia. Euro-colonial settlers and dominant groups locally have systematically repressed and erased the knowledge(s) of several communities. Ironically, Wikipedia's stringent Western-centric citation policies adhere to the same dominant and majoritarian knowledge curation systems. This results in discouraging marginalised individuals, worsening existing on-wiki knowledge gaps. This project pilots a pathway for bringing oral knowledge onto Wikimedia projects and making them citable. Currently, oral knowledge is included within Wikipedia in a very limited scope, such as using speakers' voice recordings to illustrate their language, undermining its larger value.
We will engage GLAM institutions to acquire and catalogue the materials we will create. Expanding knowledge without citations is highly discouraged within the no original research criterion. This model aims to keep the uploaded media useful as media and citations even within the current citation policies.
Source documentary films
[edit]-
Gyani Maiya (Q130366102) (2019) archived the largest amount of Kusunda-language oral history in video format
-
Remosam (Q130366103) (2019) documented Bonda festivals and traditional alcohol making and consumption rituals
-
Mage Porob (Q130366104) (2019) includes many important cultural celebrations of the Ho people
-
For MarginalizedAadhaar (Q130366105) (2021) ornithologist Taukeer Alam narrated access to knowledge issues among Van Gujjars
-
Nani Ma (Q130366106) (2022) documented previously unrecorded oral history in Baleswari-Odia
Language-specific sub-projects
[edit]Project Name | Focus Language/Dialect | Expert Name |
---|---|---|
गेम्येहाक़ गिपन (Gejmehac Gipan, literally "Kusunda language) | Kusunda (Q33630) (ISO 639-3: kgg) | Uday Raj Aaley |
ମୁନାରେମ (Munaremo) | Bonda (Q2567942) (bfw) and Desia (Q12629755) (dso) | Gobardhan Panda (Q111077171) |
मारी जबान मारी बिरसा (Maari Jaban Maari Birsa, lit. "our language our heritage) | Gujari (Q3241731) (gju) | Taukeer Alam (Q130314213) |
𑢼𑣃𑣃𑣊 𑢾𑣁𑣌𑣁𑣖 (Ruum Sakam) | Ho (Q33270) (hoc) | Ganesh Birua (Q116546852) |
ବାଲେସରିଆ (Balesoria) | Baleswari Odia (Q4850727) (ory) | Subhashish Panigrahi |
Timeline
[edit]- September 2024: Identify and onboard language speakers, activists, or experts Done
- October '24: Identify recordings best suitable for Wikipedia articles/Wikimedia projects Done
- November '24: Subtitle selected recordings Done
- December '24: Publish recordings on Wikimedia Commons and use in Wikipedia/other Wikimedia projects Doing...
Wikipedia and sister Wikimedia projects to be improved
[edit]- Phase 1
- Create/improve Wikipedia articles and Wikidata entries on peoples (larger communities), languages and notable individuals[1]
- w:Kusunda people (improve)
- w:Kusunda language (improve)
- w:Gyani Maiya Sen-Kusunda (improve)
- w:Kamala Sen-Khatri (new - Wikipedia Done)
- w:Magar people (improve)
- w:Uday Raj Aaley (Wikipedia Done and category Done), Wikidata Done, Commons Done)
- w:Van Gujjar people Done and w:Van Gujjari Done
- w:Taukeer Alam (Wikipedia, Wikidata, Commons)
- w:Bonda people
- w:Bonda language (improve)
- w:Gobardhan Panda (Wikipedia, Wikidata, Commons)
- w:Ho people (improve)
- w:Human body (Ho WP Incubator)
- w:Dama (drum) (new)
- w:Dumeng (drum) (new)
- w:Ho language (improve)
- w:Baha Parab (improve)
- w:Mage Parab (improve)
- w:Rasi (drink) (new)
- w:Handia_(drink) (improve)
- Ganesh Birua (Q116546852) (Wikidata Done, Notability to be checked for Wikipedia) (improve)
- Phase 2
- Media uploading and subtitling
Planned activities:
- Share unsubtitled and Wikimedia project-relevant media with language experts
- Experts identify media and subtitle
- Edit and clean up subtitles
- Upload media with subtitles
- Phase 3
- Media inclusion in Wikipedia
Planned activities:
- Identify WP articles
- Insert media
Additional goals and activities
[edit]GLAM collaborations
[edit]- 1. Acquisition of materials as published literature
Discuss with GLAM institutions, especially libraries and language archives, for their acquisition and cataloguing of created media. These materials will create a pathway for improving Wikipedia citations. Wikipedia currently covers very little about Indigenous peoples, languages, and cultures. While we identify that Wikimedia movement's current citation gaps, this approach helps add content and citations) about peoples and cultures that are less represented on Wikimedia projects.
- 2. Creation of Library of Congress Subject Headings (LCSH)
- Request the Library of Congress to create LCSH for subjects within the scope of this project that are missing relevant LCSHs.
- 3. Article contribution to Wikipedia/Wikimedia Incubators
Contribute media we create to create/improve Wikipedia articles and Wikimedia Incubator entries, thereby increasing their chances of becoming live Wikimedia projects.
Output
[edit]Wikimedia project and community
[edit]Commons
[edit]Video | Used in | Notes |
---|---|---|
w:Van Gujjari | ||
w:Ho people | Included in 8 Wikipedias | |
w:Bonda language | New Wiktionary appendix created and being added to Wikipedia articles |
- Requested on Phabricator (task T381934) to include multiple language codes to Commons for them to appear in the list of languages while adding subtitles.
Other
[edit]- Upon our request, Participatory Culture Foundation added in Amara, an open source subtitling platform, Kusunda, Bonda/Remosam, Gujari, Sora and Garhwali to their list of languages, making it easier for all future translations from/to these languages.
- Upon our request, Glottolog has accepted to include Van Gujjari, one of the focus languages, in the upcoming edition of Glottolog.
Call for community participation
[edit]- We helped co-design the Wiki Loves Languages edit-a-thon, led by Wikimedian Aliva Sahoo, which ran on the first pilot on Odia Wikipedia. XX editors uploaded XX articles. Many of our output media files were used in new and existing Wikipedia articles, created/edited during this edit-a-thon and beyond.
Open-source software and libraries
[edit]This is a non-exclusive list of free and open source software and libraries extensively used for this project:
- Exiftool for media conversion and metadata reading and writing
- Mediainfo for metadata reading
- Python