This is a GLAM Collaboration between Wikimedia Community in Punjab and R.V.D.J. Municipal Public Library, Patiala.
Rajindra Victoria Silver Jubilee Municipal Public LibraryMunicipal Library PatialaMunicipal Corporation Library PatialaThe Municipal Library PatialaMunicipal Library Patiala 3
Languages for Books and Manuscripts to be digitized[edit]
The collaboration meetings and formalities started in May 2018 and since then, survey for books in the institution was done several times and we had several staff meetups.
The pilot started from October 2018 and is still ongoing for digitizing rare literature and Manuscripts for Wikisource Project. Initial pilot started with 50 Books which were scanned and uploaded by the staff members of institute, under direction of Wikilover90. The consequent edit-a-thons and proofreading events were done to further proofread, validate and integrate the books into Wikipedia.
For this, various meetups, training workshops and Wikisource events were executed.
In order to archive open-source content for free use, we are digitizing and documenting Public Domain content in collaborations with different institutions pan-Punjab by setting up Wikimedian-in-residence who has the knowledge, links and qualifications to lead the project and act as liaison with the GLAM partners and Government institutions.
This collaboration aims to increase the availability of Public Domain Punjabi works on Punjabi Wikisource by digitizing the books available at the institution with the focus to preserve Punjabi heritage and history. Along with that, we would like to start Hindi Wikisource, that is not yet live. We would be digitizing Urdu, Shahmukhi, and English books along with Punjabi and Hindi work.
Punjabi Wikisource would be the main centric focus for Library Partners collaborating with Punjabi Wikimedians. They will provide us with Books, manuscripts, and archives in Punjabi, Hindi, Shahmukhi, and Urdu language. With the help of community resources - volunteers and tools, the scanned, post scan processed and uploaded files will be indexed, proofread, validated, the transcluded on Wikisource and later integrated into Wikipedia.
Important Books of Punjabi, Hindi, Sanskrit and English available under Public Domain uploaded under Creative Commons License.
Wikimedia training workshops to facilitate the digitization program and better improve coordination among Indic Wikisource community, thus, strengthening the relationship between the Government GLAM institutes and Wikimedia community
Teaching GLAM staff about policies and practices of Wikimedia projects and free copyright licenses for Wikimedia projects.
Inform GLAM representatives about possibilities and scope of Wikimedia movement and digitization of content under CC License.
Promotion of participation in Wikisource
Add content to the Wikimedia Commons and Wikisource sites
The bookshelves are quite dusty and have strong allergic element that requires anyone visiting there and coming in contact with dirt requires medication. The government Municipal libraries are not in good shape and this institutions has not been cleaned since 13 years.
No proper catalogue, there is no online catalogue and the books are not in the stated bookshelves, which basically is making us do a lot of manual search in the dusty shelves, but we are finding quite interesting and important books, making it worth labor and allergic infection that comes with it.
It is challenging to find bibliographical and author bio information at online directories and archives for Punjabi authors from old times and the work continues, involving offline archives and books about author information.
Initially, there was trouble with OCR due to lack of Linux devices and software for Mac but recently that issue got solved for Indic community, courtesy to Jay Prakash developing Indic OCR Tool that is now integrated in all Indic language Wikisource.
Post Processing of the books is challenge still are trying to solve. Initially had difficulty finding post processing software for MAC. On-going work is slower because of archived post processing work.
SV600 Fujitsu Scanner that Punjabi Community currently owns can scan only small books and books that are of larger size and thicker volume don't get scanned completely without cutting off the lower half of the pages. Doing that with Sony Camera did not produce the right results either.
Bad bandwidth of internet created issues in the uploading of books.
In the process of uploading the books, there was issue with the underneath OCR layer that was picked up by the software. Had to rectify that by saving it again in different formats.
The proofreading was a challenge. In past two years, Punjabi Wikisource had less than 1170 pages proofread till October 2018 since beginning of 2017 when Punjabi Wikisource started. With a small project like ours that was still in beginner's phase in the past 2 years, getting the digitized content integrated in Wiki projects was a big challenge. With persistant campaigning via social media and outreach and bringing new volunteers for Punjabi Wikisource, we were able to complete this project.
Manual search through hundreds of books to create a raw data for: Name of book, name of author, publishing date ( if stated on book), publishing company,
Correspondence via emails with Commissioner office to finalize details of the agreement
Search in Wikidata items through different queries to find:
Authors of Punjab
Authors who were born in Punjab
Authors who wrote in Punjabi
Authors of India
Authors of India, Pakistan and British raj
People of India who spoke Punjabi
People of India and Pakistan without profession author
People who wrote during British raj
Search through each Wikidata item and the attached Wikipedia article to verify the information and cross it off the list or edit/add information to Wikidata item
Search in the list of authors and poets from Punjabi from wikipedia
Search in the list of authors from Wikisource
Search in various online archives for the same above information
Checked some books such as, Lekhak Sandarbh Kosh to check biodata about author’s dob
Consultation with various research scholars and professors for the author’s information and the book directories and to get access with issue
Consultation with copyright experts for Indian authors
Search for different online directories, archives and books for author information
Wikisource stats before the pilot project started in October 2018[edit]
Page namespace (Pages of Books)
Main namespace (Article)
language
all pages
not proof.
problem.
w/o text
proofread
validated
all pages
with scans
w/o scans
disamb
percent
te
47726
13502
39
1098
33087
24314
13118
3986
9132
0
30.39
bn
708658
684743
566
6789
16560
7474
7629
7599
15
15
99.80
ta
403283
387310
24
75
15874
7804
5768
1521
4247
0
26.37
gu
13048
1372
9
280
11387
8870
5777
1550
4227
0
26.83
ml
20849
12326
130
307
8086
671
6397
717
5680
0
11.21
sa
43557
39858
147
142
3410
2080
19482
216
19266
0
1.11
kn
48767
44805
26
481
3455
1190
21035
118
20917
0
0.56
or
6932
3815
3
50
3064
530
667
96
566
5
14.50
pa
5064
3747
6
42
1269
381
138
25
113
0
18.12
as
1470
805
0
5
660
159
1365
31
1334
0
2.27
mr
17609
16922
36
11
640
24
1496
1
1495
0
0.07
Current Indic Wikisource stats in January 2019[edit]
Statistics on Saturday, 26. January 2019 12:01PM
Page namespace
Main namespace
Language
All pages
Without text
Not proofread
Problematic
Proofread
Validated
All pages
With scans
Without scans
%
as
2869
29
1747
1
1092
604
1402
56
1346
3.99
bn
709814
6827
685134
572
17281
7537
7551
7521
16
99.79
gu
13887
306
1140
9
12432
9999
6001
1777
4224
29.61
kn
50281
559
45867
45
4068
1810
21056
131
20925
0.62
ml
46863
2050
12501
131
32181
688
6416
729
5687
11.36
mr
18454
11
17569
36
838
28
1528
1
1527
0.07
or
7131
57
3338
3
3733
2112
656
99
552
15.21
pa
11060
188
4491
47
6334
551
149
31
118
20.81
sa
52386
149
47305
24
4907
2892
20434
437
19997
2.14
ta
403344
86
383253
53
19952
9657
6034
1793
4241
29.71
te
49724
1165
15123
68
33368
24713
13113
3971
9142
30.28
Stats of Proofreading done between December 14, 2018 and Feb 1, 2019, making Punjabi Wikisource the fastest growing and most active community globally in terms of content and editor growth