The collaboration meetings and formalities started in May 2018 and since then, survey for books in the institution was done several times and we had several staff meetups.
The pilot started from October 2018 and is still ongoing for digitizing rare literature and Manuscripts for Wikisource Project. Initial pilot started with 50 Books which were scanned and uploaded by the staff members of institute, under direction of Wikilover90. The consequent edit-a-thons and proofreading events were done to further proofread, validate and integrate the books into Wikipedia.
For this, various meetups, training workshops and Wikisource events were executed.
In order to archive open-source content for free use, we are digitizing and documenting Public Domain content in collaborations with different institutions pan-Punjab by setting up Wikimedian-in-residence who has the knowledge, links and qualifications to lead the project and act as liaison with the GLAM partners and Government institutions.
This collaboration aims to increase the availability of Public Domain Punjabi works on Punjabi Wikisource by digitizing the books available at the institution with the focus to preserve Punjabi heritage and history. Along with that, we would like to start Hindi Wikisource, that is not yet live. We would be digitizing Urdu, Shahmukhi, and English books along with Punjabi and Hindi work.
Punjabi Wikisource would be the main centric focus for Library Partners collaborating with Punjabi Wikimedians. They will provide us with Books, manuscripts, and archives in Punjabi, Hindi, Shahmukhi, and Urdu language. With the help of community resources - volunteers and tools, the scanned, post scan processed and uploaded files will be indexed, proofread, validated, the transcluded on Wikisource and later integrated into Wikipedia.
Important Books of Punjabi, Hindi, Sanskrit and English available under Public Domain uploaded under Creative Commons License.
Wikimedia training workshops to facilitate the digitization program and better improve coordination among Indic Wikisource community, thus, strengthening the relationship between the Government GLAM institutes and Wikimedia community
Teaching GLAM staff about policies and practices of Wikimedia projects and free copyright licenses for Wikimedia projects.
Inform GLAM representatives about possibilities and scope of Wikimedia movement and digitization of content under CC License.
Promotion of participation in Wikisource
Add content to the Wikimedia Commons and Wikisource sites
The bookshelves are quite dusty and have strong allergic element that requires anyone visiting there and coming in contact with dirt requires medication. The government Municipal libraries are not in good shape and this institutions has not been cleaned since 13 years.
No proper catalogue, there is no online catalogue and the books are not in the stated bookshelves, which basically is making us do a lot of manual search in the dusty shelves, but we are finding quite interesting and important books, making it worth labor and allergic infection that comes with it.
It is challenging to find bibliographical and author bio information at online directories and archives for Punjabi authors from old times and the work continues, involving offline archives and books about author information.
Initially, there was trouble with OCR due to lack of Linux devices and software for Mac but recently that issue got solved for Indic community, courtesy to Jay Prakash developing Indic OCR Tool that is now integrated in all Indic language Wikisource.
Post Processing of the books is challenge still are trying to solve. Initially had difficulty finding post processing software for MAC. On-going work is slower because of archived post processing work.
SV600 Fujitsu Scanner that Punjabi Community currently owns can scan only small books and books that are of larger size and thicker volume don't get scanned completely without cutting off the lower half of the pages. Doing that with Sony Camera did not produce the right results either.
Bad bandwidth of internet created issues in the uploading of books.
In the process of uploading the books, there was issue with the underneath OCR layer that was picked up by the software. Had to rectify that by saving it again in different formats.
The proofreading was a challenge. In past two years, Punjabi Wikisource had less than 1170 pages proofread till October 2018 since beginning of 2017 when Punjabi Wikisource started. With a small project like ours that was still in beginner's phase in the past 2 years, getting the digitized content integrated in Wiki projects was a big challenge. With persistant campaigning via social media and outreach and bringing new volunteers for Punjabi Wikisource, we were able to complete this project.
Manual search through hundreds of books to create a raw data for: Name of book, name of author, publishing date ( if stated on book), publishing company,
Correspondence via emails with Commissioner office to finalize details of the agreement
Search in Wikidata items through different queries to find:
Authors of Punjab
Authors who were born in Punjab
Authors who wrote in Punjabi
Authors of India
Authors of India, Pakistan and British raj
People of India who spoke Punjabi
People of India and Pakistan without profession author
People who wrote during British raj
Search through each Wikidata item and the attached Wikipedia article to verify the information and cross it off the list or edit/add information to Wikidata item
Search in the list of authors and poets from Punjabi from wikipedia
Search in the list of authors from Wikisource
Search in various online archives for the same above information
Checked some books such as, Lekhak Sandarbh Kosh to check biodata about author’s dob
Consultation with various research scholars and professors for the author’s information and the book directories and to get access with issue
Consultation with copyright experts for Indian authors
Search for different online directories, archives and books for author information