Grants:PEG/Europeana/GLAMwiki Toolset
- Withdrawn
- See the discussion page of this grant request for more information.
- Since this grant was withdrawn, no report is required and no further action is required from the grantee.
(~$96,100 USD)
mass upload system for Wikimedia Commons”
Goal
[edit]This grant proposes to extend the functionality of the GLAMwiki Toolset and greatly improve its usability.
The GLAMwiki Toolset (GWT) is the first standardised mass-upload system to Wikimedia Commons. Until now, all mass-uploads have been undertaken with bespoke scripts/bots that are highly customisable but have extremely low scalability. The current implementation of the GWT demonstrates that it is possible to lower the barrier to entry for mass uploads.
Context
[edit]Following the first round of development (project documentation & project initiation plan), the GWT is now being used successfully by GLAMs, other organisations and individual wikimedians to bring content to Wikimedia Commons. Although created by a non-Wikimedia organisation (Europeana) it was built as an open-source MediaWiki extension (see the technical architecture and design) in a manner compliant with the best practices of the community. It is the only “integrated upload tool” for Wikimedia commons designed for multiple simultaneous uploaded (aside from the WMF-developed Upload Wizard) and is part of the operating landscape for Wikimedia Commons software development, notably with the Wikimedia “structured data” team.
Much of the funding for the first round came from a consortium of Wikimedia Chapters, for various reasons this funding route is not viable for the next phase. In particular it is more appropriate and efficient for a non-modular development to get one funding decision by one part of the Wikimedia movement rather than combine several smaller grants from different parts of the movement.
However, GWT has not reached a software development stage where it can be used by GLAMs without assistance. Currently, the primary users of the tool are those people already familiar with Wikimedia Commons who have the technical capability for mass-upload using their own scripts. Hence, users of the GWT who are not already intimately familiar with Wikimedia Commons still require large amounts of personalised advice and support from Commons experts. Even as a best-case-scenario, a successful upload from a new user takes several days to prepare, train, receive permissions, troubleshoot and upload.
GWT 2.0 therefore needs to become user-friendly enough for GLAMs to operate without the need for direct “hand-holding” from Wikimedia Commons experts. It also needs to include other key desirable features that are clear from GLAMs’ feedback about real-world usage of the current implementation.
Need
[edit]Experiential and anecdotal evidence demonstrate that the current system, and prior to it the lack of any integrated system, is causing frustration and unfulfilled multimedia partnerships. Due to the difficulty of explaining the complex processes of Wikimedia, in some instances the "easiest solution" has been to ask GLAMs to place their material on Flickr first, which can then be imported by existing Wikimedia bots. While metatada-mapping from one database to a different system will, and should, never become a "one click" operation, the fact that there has been no change in the "official" methods for batch uploading to Wikimedia Commons for years has had a cumulative negative effect on our reputation and responsiveness. In spite of well known organisation difficulties and upheavals at Yahoo!, the "FlickrCommons" project is often considered to be a higher priority for a GLAM to share their content due to its ease of use.
Over the years, the Wikimedia Foundation has consistently confirmed that it does not wish to become directly involved in supporting GLAM activities or building related tools, but that it nevertheless sees this as important:
This availability of appropriate technology is clearly increasingly a distinguishing factor for Wikimedia relative to more commercial offerings in its appeal to the cultural sector.
At the same time, WMF itself doesn't currently prioritize work with the cultural sector very highly, which I think is appropriate given all the other problems we have to solve. So if this kind of work has to compete for attention with much more basic improvements to say the uploading pipeline or the editing tools, it's going to lose. Therefore I think having a "cultural tooling" team or teams in the larger movement would be appropriate.— Erik Moeller, Wikimedia-L, 26 June, 2014
Impact
[edit]As detailed in Plan section below, the major results from work done with this grant will be:
There are other expected and possible results such as: better preparation for structured metadata; other input formats accepted; an improved user interface; and a User Group as well as improved documentation; screencasts; split value mapping of multi-value fields; supported non-flat metadata formats and preset mapping examples for EDM.
Fit to strategy
[edit]Primary: Improve Quality
The idea that digitised cultural heritage information should be publicly available for reuse is now a mainstream concept around the world (e.g. the OpenGov and OpenGLAM communities) and the steady increase of the visibility, variety and scale of GLAMwiki projects is testament to this. Some activities, like setting up “wikipedian in residence” programs, can grow organically, but the infrastructure needed to successfully undertake popular GLAM activities does not. Given that: mass-multimedia donations is a common GLAMwiki activity; the number of these donations is increasing and the quality (and provenance) of the content that is being donated is the best available, the Primary fit to strategy is to Improve Quality.
This grant application fits specifically within the scope of the WMF’s promise to “Provide project funding for efforts to connect Wikimedia projects with the work of institutions of culture and learning” which is written in the “Wikimedia Movement Strategic Plan Summary - Improve Quality”.
-
Eurasian Spoonbill (2011), Stichting Natuurbeelden collection
-
The White rabbit - from The Nursery Alice (1890), British Library collection
-
Piazza di Monte Citorio (1980s), Library of Congress collection
-
Ship salvaging the wreckage of the Tay Bridge (1880), National Library of Scotland collection
-
Portrait of actor Sawamura Sojuro III in the role of Kakogawa Honzo (1795), Rijksmuseum collection
-
Broome County Alms House, Binghamton, N.Y. (1876), New York Public Library collection
-
Couvent des Filles-de-la-Croix (1892), collection of the French Ministry of Culture
-
Art Blakey at the Umeå jazz festival (1979), Collection of the Swedish National Heritage Board
Secondary: Increasing Participation
If successful, the primary outcome of the GLAMwikiToolset will be uploads of sets of high quality multimedia for use in Wikimedia projects directly by GLAMs and other external groups themselves. There is a pent-up demand from GLAMs to share their own high quality content with Wikimedia but the number of people who can assist them has not grown to meet the demand. Therefore, the Secondary fit to strategy is to Increase Participation, particularly among the professional GLAM sector. This toolset should make it easy for external organisations to share their own content with Wikimedia, to enable it to be found and used easily. By extension, this will also increase Wikimedia participation as the GLAMs use their own organisational capacity to promote volunteer activities in Wikimedia through local events and in partnerships with Chapters.
Therefore, in accordance with the “Wikimedia Movement Strategic Plan Summary - Increase Participation” this grant will directly help to “develop new tools”, “support volunteer initiatives” and to “Support Wikimedia Chapters”.
Measures of success
[edit]- Increased number of users of the GWT- especially from individuals in the GLAM sector and Wikimedians who have never previously undertaken mass-uploads. The greater number of users of the GWT, irrespective of the total number of files they have uploaded, the more it is demonstrably incorporated into the standard practices of Wikimedia Commons. Furthermore, if those users are new to mass-uploading, rather than changing from their previous method to the GWT, this demonstrates increased usability. This metric is tracked with the "commons uploaders in cat[egory]” tool, searching for uploads in Category:GWToolset_Batch_Upload and the user list - GWToolser users Commons special page.
- Increased number of uploaded datasets. This is highly related to “Global metric #4: Number of new images/media added to Wikimedia articles/pages”. However, rather than falling into a trap of merely counting the raw number of images uploaded it is more productive to talk about the number of new collections that have been uploaded. The nature of the GLAM itself will determine what content is most valuable to Wikimedia, not merely the raw number of files.
- Increased proportion of uploaded material used in Wikipedia articles. Demonstrating the re-use potential of mass-uploaded content, especially within Wikipedia articles, is crucial to arguing why GLAMs should share their digitised material under free licenses, and in high quality. Using tools such as Commons delinker to find-replace lower quality versions of the same image is a direct way to improve quality. The usage-rate (and combined pageview count) of mass uploads can be tracked with the baGLAMa tool. It should also be noted that this grant is focused solely on the infrastructure to facilitate mass-uploads. The task of generating community enthusiasm to categorise and use the uploaded material remains with the volunteer/GLAM/Chapter who undertook the upload.
Plan
[edit]Tasks
[edit]In determining this list of important yet viable additional features and system improvements it has been necessary to also declare other popular ideas to be out of scope for this grant request. Prioritisation was based on a combination of difficulty and significance, but also in an attempt to software coherence - that the GLAMwikiToolset would not have half-finished elements which would dependent on future development funding in order to work. Items listed here are in order of priority according to the MoSCoW method - Must, Should, Could and Won’t.
Each item has an estimate of development time, and therefore cost, associated with it. There are two methodologies used, depending on the the item:
- Timeboxed items are points which are difficult to estimate the scope before development begins. Furthermore, they suffer from the law of diminishing returns. In these cases a set number of developer-days is allocated to working on the elements within that item, after which development would cease.
- Three point estimate items are those which have clearer scope and have a specific success/failure point. We have estimated a most likely, best case, and worst case scenario for each of these and then (using the formula suggested by the Wikipedia article E = (a + 4m + b) / 6 ) provided our best estimate.
Must
[edit]Identify major bugs in current system and fix
[edit]The existing GLAMwiki Toolset has known bugs that have already been triaged in Bugzilla (archive link) [now Phabricator]. It is important to address the most serious of these as a first priority in order to ensure that the fundamental GWT platform is stable irrespective of future features and improvements.
Acceptance Criteria: See description in each major bug in this indicative list
- GWToolset uploads files without file description pages
- GWToolset fails to upload files and throws no warning
- GWToolset throws error with file too long only in some circumstances
- gwtoolset invalid xml file screen should be more descriptive
- GWtoolset: Project link broken on Special:GWToolset broken for translation
- Unit test, using WikiImporter fails to create templatedata
- Delete temporary files as soon as they are not needed
Estimate:
Timebox to 15 developer days.
While we are budgeting a given amount of time there is no guarantee that this time is enough to completely resolve all of these bugs. Additional time during the year is also needed to account for bugs-arising during the project itself. Therefore we include an additional timebox estimate of 5 developer days, giving a total of 20 Developer days
Improved reporting
[edit]The power and flexibility of the GWT allows the uploading of large amounts of content with complex associated metatadata. However, the inherent difficulty of this power and flexibility is the variety of reasons that the system might fail, and what should be changed to rectify the issue. Some of the common reasons for difficulty are within the control of the GWT, while others occur ‘upstream’ with WMF infrastructure, and still others are ‘downstream’ with the user’s own computer. Presently, there is very little information given to a GWT user when an upload failure occurs, other than to ‘check the logs’ and to ‘contact a developer’. While this general advice will remain the most useful thing that can be said in many cases, at least some of the most common causes of upload-failure can be addressed with specific error reporting.
Narrative:
As a new GLAMwikiToolset user who is technically competent, has read all the documentation, and has followed procedures, all my files upload correctly. When they don’t, a clear explanation is given to me with advice for solving the problem and how to ask for assistance. This feedback gives me confidence in the progress of my upload.
Acceptance criteria:
- Log in as a user with the GWT-user right and attempt an upload with information that is known to be incorrect. The system fails the upload but informs the user why, suggests how to fix it, and explains what parts of the process need to be repeated (or not).
- Repeat with a different known problem with the source material. The system fails in a consistent manner but provides a different and appropriate error explanation, and suggestions.
Estimate:
Given the unknown number of different potential reasons for an upload-fail, it is not possible to give a specific list of bugs that need fixing (or their complexity). Therefore, we allocate a “Timebox” of 5 developer days to address as many as possible, in order of their priority.
Live preview of how metadata will appear
[edit]One of the major requests from newly registered wikimedians (from GLAMs) attempting to use the GWT is the difficulty of knowing what their metadata mapping ‘means’ in a practical sense. Being able to see the results of their work, while they’re doing it, in the format that it will appear on Wikimedia Commons will increase the understanding of the task being requested and make it a less abstract exercise.
The advice of user-interface designers from the WMF with regards to this feature would be useful as in-kind support for the grant.
Narrative:
As a user making a GWT upload but who is unfamiliar with Commons templates, I am presented with information enough to allow me to make an informed choice of which template to use. I am also given an intuitive interface to map my metadata into the correct fields of the template. An example of how my metadata will look is shown to me as I work, using my own files as an example. This gives me confidence in how my work will look at the end, and lets me show my progress to my colleagues.
Acceptance criteria:
- When asked to select a Commons template the user is given a clear explanation of the purpose of each template.
- The chosen template is shown to the user in a way that it would look once the upload is complete. As the user maps their metadata, the demonstration updates to give immediate feedback about the implication of their mapping choices.
- Free-text fields are, as much as possible, replaced with pre-populated lists of options (e.g. licenses)
Estimate:
Worst case, 20 developer days
Best case, 5 developer days
Most likely, 15 developer days
Three-point estimation result: 14 developer days
Should
[edit]Prepare for structured metadata
[edit]A major forthcoming project of the Wikimedia Foundation and Wikimedia Deutchland is to bring the power of WikiData’s structured information to Wikimedia Commons. This will greatly increase the quality of Wikimedia Commons’ information by making it more consistent, searchable, machine-readable, exportable and localisable. However, at this point the project is still new and the parameters are undefined. Given that the GWT is both an important tool for power-users of Wikimedia Commons, and also the conduit through which GLAMs (an important stakeholder for Wikimedia Commons) interact with the project, it is necessary that developments to the GWT happen in a way that is consistent with the development of the Structure Data project. As both projects are in a state of development it is an excellent opportunity to ensure that no new blockers are created which would hinder their smooth interaction at a later date. This will require regular contact between the Europeana team and the Wikidata team. As a corollary of this, the Structured Data project will have access to expert advice from the Europeana team who have a great deal of experience in dealing with managing complex and non-compatible metadata integration projects.
Narrative:
As a Wikimedia Commons administrator, when the time comes to switch over to a Wikidata-hosted Structured Data system, all of the files that have been uploaded with the GWT convert correctly.
As a user of the GWT after the Structured Data project has been running on Wikimedia Commons for a while, all of my metadata is uploaded in the [new] best-practice way.
Acceptance criteria:
- The standard Europeana Data Model (EDM) and related vocabularies are appropriately mapped to Wikidata items and properties.
- The GWT, with minimal extra effort, can be made to refer to Wikidata’s schema to populate templates rather than the information on Commons directly.
Estimate:
- Worst case, 15 developer days
- Best case, 5 developer days
- Most likely, 10 developer days
- Three-point estimation: 10 developer days
Accept CSV dataformat for input
[edit]The current version of the GWT only accepts metadata in XML format as it is the most flexible and can accommodate any input values. However, it is not necessarily a format that GLAM database managers would prefer to export their metadata in. Irrespective of the format used, some amount of metadata preparation to get the initial data 'cleaned up' before beginning the GWT process will always be required. Increasing the range of accepted formats increases the likelihood that the database manager will be able to work in the format they are most comfortable/capable in. Among all the possible formats out there, CSV and JSON have been identified as being the most popular and useful for extending the functionality of the GWT.
- As a tabular format, Comma [or Tab] Separated Values (CSV/TSV) is very simple, and is best suited for smaller GLAMs that neither have a database manager nor bespoke software.
- JSON provides the most functionality and is the most popular export format for larger and more technically-minded institutions.
However, in order to restrain the scope of the project, only CSV will be considered a priority, and JSON will be a stretch goal.
Narrative:
As a GLAM wanting to use the GWT, I am most comfortable exporting my database information in CSV format. Instead of converting this to a different format (and potentially losing data-quality in the process) I wish to upload that format to the GWT directly to begin my metadata-mapping .
Acceptance criteria:
- The GWT produces the same result when provided with the same input metadata in CSV or XML formats.
- As a stretch goal the GWT will also accept JSON metadata in the same manner.
Estimate:
- Worst case, 15 developer days
- Best case, 5 developer days
- Most likely, 10 developer days
- Three-point estimation: 10 developer days
Improve User interface
[edit]As with any new software, the majority of development effort occurs on "behind the scenes" to ensure that the system works as expected. However, there has not been any specific focus on the appearance and flow of the GWT to ensure that its functions are clear, logical and consistent. Now that the underlying architecture is in place, and the existing user-interface is stable enough to have a small user-base, it is important to test the assumptions of the original visual and interface design choices (and modify them as necessary).
Narrative:
As a technically capable but first-time user of the GWT I am confident that I know the purpose of each function or question that I am shown, where I am within the process, and how to navigate through that process. I can interact with the system without having to second-guess its workflow so that I can concentrate on the task at hand.
Acceptance Criteria:
- User-testing reports indicate that representatives of targeted user-groups (Wikimedians, GLAM database managers) are able to navigate the GWT process confidently.
- A decrease in requests to the existing GWT user group from new GWT users for personal assistance.
Estimate:
The requirement is very open ended and thus no estimate can be made. Therefore we allocate a timebox of 20 designer days and 20 developer days for pure UI/UX improvements. This comprises two rounds of heuristic review and two rounds of user testing. This also covers any UX needs of other elements in this project proposal.
Facilitate User group
[edit]The GWT in its current form has been publicly available and promoted as “ready for use” for approximately half a year. This has resulted in number of requests for in-person training sessions and increased activity on the mailing-list for GWT users. This group would very likely grow as the feature-set increased and the difficulty decreased. Furthermore, the existing user-group will be very important in providing feedback, support, advice and bug-identification during further development. Under the project design of the first iteration of the GWT a formal “steering group” was established, with representatives of the financial stakeholders of that project. It is possible a similar group could be established - using the 'glamtools' mailing list - representing a wider range of stakeholders. Once development work has finished, active promotion and training of the [new] tool would become the focus. All of these activities require active facilitation both online and in-person.
We also expect to participate at several Wikimedia events in-person throughout the development period as part of the process of increasing awareness, soliciting feedback, and general community liaison. We can expect that there will be approximately 6 events in Europe plus one event outside Europe (Wikimania);
- 1 x Wikimania (Mexico City)
- 1 x European Hackathon (France)
- 1 x GLAMwiki conference (Netherlands)
- 2 x Wikidata / Structured data hackathons (2xGermany?)
- 2 x Requests for training/participation at Chapter-organised events (like, for example, the October 2014 Helsinki GWT training organised by Wikimedia Finland). [Estimate of two requests throughout the year]
This is a total of 7 in-person events.
It is expected that the GWT usergroup (both existent and potential future users), mediawiki developers, and Commons-experts will be present at these events, thus making Europeana’s attendance important to ensure community support and awareness for the changes to the GWT.
Narrative:
As a member of a European Wikimedia Chapter that has been involved in GLAM activities but never previously with the GWT directly, I am able to easily get involved with and follow the development of the new features of the system. I am also able to help coordinate outreach training for my local GLAMs. This enables me to undertake Wikimedia outreach to new GLAMs with confidence that I have a support network.
Acceptance criteria:
- Different stakeholders of the GWT during the development process are aware of the current state of the project and know who to contact to ask or answer questions.
- Existing stakeholders of the GWT feel incorporated and consulted in the development process.
- A GWT presence at all major Wikimedia events which have relevance to GLAMwiki activities.
Estimate:
Worst case, 30 facilitator days
Best case, 10 facilitator days
Most likely, 20 facilitator days
Three-point estimation: 20 Facilitator days
- Travel budget: €5.500.
- 7 European events ~ 7 * €500 = €3500. Assuming an average of €500 budget for each European event (GLAM WIKI conference, European Hackathon, 2 Wikidata events and 3 requests to attend events)
- Wikimania (Mexico City), €2000.
- 7 European events ~ 7 * €500 = €3500. Assuming an average of €500 budget for each European event (GLAM WIKI conference, European Hackathon, 2 Wikidata events and 3 requests to attend events)
[Rather than attempt to itemise the cost of travel to the aforementioned events (some of which do not officially have locations/dates, and before the Europeana staff know who would be attending on the project's behalf), this grant estimates a fixed amount which will be used for travel/accommodation/registration for those events in amounts considered appropriate by the Europeana management at the time. It is expected that the total spent on attending these events will exceed this request and the remaining money will come from general Europeana funds]
Improve documentation & Integrate
[edit]There is already a large amount of documentation written for the current implementation of the GLAMwiki Toolset (on Mediawiki.org at Help:Extension:GWToolset). It is important that this be complete, consistent, clear, and updated to take into account new features that result from the work indicated in this grant proposal.
It is also important that the documentation be linked from all elements of the GLAMwikiToolset process - so that users can quickly access documentation at their point of need. This could be achieved through links to the existing documentation page that open in a new tab and/or tooltips.
Narrative:
As a first-time user of the GLAMwikiToolset I can access the relevant documentation for the particular part of the process I am currently undertaking. This lets me find the answer to my question quickly and allows me to continue my upload without losing my place.
Acceptance criteria:
- All stages of the GLAMwikiToolset have matching documentation that is accessible to read both before an upload begins, and within the process itself.
- The documentation is user-tested and editorially reviewed to ensure clarity and helpfulness.
- The documentation is fully translated into at least one other language to ensure its translation compliance.
Estimate:
- In-Tool development estimate:
- Worst case, 3 developer days
- Best case, 1 developer days
- Most likely, 2 developer days
- Three-point estimation: 2 developer days
- Documentation-writing estimate:
- Worst case 5 writer days
- Best case 1 writer day
- Most likely 2 writer days
- Three-point estimation: 3 technical-writer days
Could
[edit]Screencasts
[edit]The best way to learn how to use the GWT, either in advance or during the process, is to follow someone else doing it. While it’s not possible for everyone to sit with an existing GWT user, screencasts are the next best thing. Screencasts have been made during the development of the GWT. It is proposed that a new, scripted and post-produced, series be made after major development of new features is complete. They would be designed to mirror the main steps of the GWT upload process: metadata preparation (to create a flat XML file); obtaining user-permissions; metadata upload and mapping on Beta; troubleshooting your initial upload; custom-templates on Wikimedia Commons; copying the needed information from Beta to Commons; and undertaking the final upload.
Narrative:
As a new GWT user, and a visual learner, I want to watch someone else perform an upload first, so I know what to expect. This gives me confidence in using the tool and allows me to more easily explain what I’m doing to my colleagues.
Acceptance criteria:
- All major steps of the GWT process have a matching screencast
- The videos have a consistent style and are integrated with the improved documentation, the upload software itself, and each other.
- The videos are transcribed and both the video file and timecoded transcription are available in the appropriate format/license to be uploaded on Wikimedia Commons.
Estimate:
Assuming 3 screencasts in total.
Worst case: 3 developer days + 6 video-producer days (2 per screencast)
Best case: 1 developer day + 3 video-producer day (1 per screencast)
Most likely: 2 developer days + 3 video-producer days (1 per screencast)
Three-point estimation 2 developer days + 3 video producer days
Split value mapping of multi-value fields
[edit]Any transition from one database to another involves an inevitable risk of data-loss through the simplification of the information needed to fit the information into a new format. Increasing the flexibility of what can be accepted correspondingly decreases the amount of data-loss. One of the key inflexibilities of the current GWT implementation from the GLAM's perspective is the inability to split multiple values in one source field and map them to multiple different fields in the target template.
Narrative:
As a GLAM employee with a database of complex metadata pertaining to my collection, I would like to reduce data-loss as much as possible when exporting it. Many items have multiple dates, multiple authors and multiple locations in one field associated with them (e.g. a digitisation of a historic photo of a sculpture that has since been moved to a museum) and I wish to keep as much of that information as possible with the file so that the information on Wikimedia Commons is as good as possible.
Acceptance criteria:
- A user of GWT can split and map a multi-value fields to multiple target fields.
- A prerequisite for the feature is that the multiple values in the source element are character separated.
Estimate:
- Worst case, 3 developer days
- Best case, 1 developer days
- Most likely, 2 developer days
Three-point estimation: 2 developer days
Support non-flat metadata formats
[edit]The current GWT implementation requires that the XML file which includes all the metadata to be used in the upload be in a “flat” structure. This means that there are no “parent-child” or “nested” relationships in the metadata. While this makes it easier for the GWT software to understand the information being submitted, this also increases the amount of work required by the end-user to get their metadata prepared correctly. Many of the standard GLAM metadata software programs use highly structured metadata. Exporting it to a “flat” file is both time-consuming and loses data-quality. Supporting more structured metadata allows less data-loss as it requires less data-conversion from the original format.
Example of a comparison between a 'flat' and 'non-flat' metadata | |
---|---|
'Flat' XML | Non-'Flat' XML |
An example of a flat XML file | An example of an XML file with a deeper hierarchy |
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
</metadata> |
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
</metadata> |
The metadata in the field author, subject and rights will we recognised | The metadata in the deeper levels will not be recognised in the current GWT implementation. |
Narrative:
As a GLAM database manager I am able to extract my metadata to a format acceptable to the GWT with as little as possible data-conversion required. This saves me time as well as decreasing the amount of nuance in my data, thereby making the process faster and higher quality.
Acceptance criteria:
- The tool correctly reads a structured xml file to begin mapping
- The tool can work with two target templates not just one
Estimate:
- Worst case, 25 developer days
- Best case, 5 developer days
- Most likely, 15 developer days
Three-point estimation: 15 developer days
Pre-set example mapping for EDM
[edit]Highly related to the aforementioned priority to ‘accept different data formats for input’ is the ability of GWT users who have their metadata in the most common standards to be able to easily see how these standards “map” to Wikimedia Commons. While it is not possible to account for every variant of metadata schemas (and that’s not including data that is not correctly formatted in the first place), there are some common types that are “low hanging fruit” to account for a large proportion of the potential user-base. Notably among these is the Europeana Data Model (EDM) - in which all material in Europeana is formatted. Implementation of this as an “example mapping” would allow for easy export of multimedia available in Europeana to the GWT, and also provide a template for other common metadata schemas (such as LIDO or LIDOC CRM).
Narrative:
As a GLAM database manager I have spent a lot of time ensuring that my collection records data is consistent and standards-compliant to make it visible in the Europeana website. Because of that, I don’t have to start from scratch to make it also compliant to Wikimedia Commons' needs, but can easily convert my data to the needed structure.
Acceptance criteria:
- Easy conversion of EDM metadata for usage with the GWT
Estimate:
The first mapping from EDM to a Wikimedia Commons template (e.g. "art photo"), 4 days.
For each further EDM to Commons mapping, 2 days.
Assuming 4 different commons templates: 10 data modeller days
Won't [out of scope]
[edit]Issues that did not receive prioritisation and are therefore out of scope for this grant application include the items below. They are included here to emphasise the work that has gone into targeting the highest value items while recognising that there are other items which are still very desirable. It is hoped that the list below can also be addressed by Europeana, Wikimedia Foundation, or another party in the future.
Explanation of all project ideas that are out of scope for this grant |
---|
One of the key technical constraints of the current GWT implementation is that the files to be uploaded to Commons be already publicly visible online. This requirement ensures the GWT system can continue working regardless of whether the uploader’s own computer is turned off etc. However, many potential uploaders do not meet this criteria - either because they don’t have a website in the first place, or the high-resolution files are not currently publicly visible. Allowing the upload of offline files would greatly increase the usability of the tool, but this would most likely require integration with some form of temporary-hosting system (e.g. potentially Europeana Cloud, Wikimedia Labs, or Commons Beta) a high-difficulty project which would greatly increase the required budget.
Statistics on multimedia usage from GLAMs in Wikimedia is a perennial request from the GLAMwiki community - both from within Wikimedia and from GLAMs themselves. See previous Europeana-created, Wikimedia-Chapter commissioned, report on GLAM needs. However this is a separate topic to the upload technology and should be treated as such.
The current GLAMwiki Toolset system requires the uploader request and receive user-based permission on both Commons Beta, then ‘real’ Commons. This is to ensure the powerful tool is not misused. However, this process is both cumbersome and slows down the process. It also means that a one-time user has the permission allocated to their user account in perpetuity. Instead, the original plan of the GLAMwikiToolset was to give permission for a mass-upload on a batch-by-batch basis with both the uploader and an appropriate authority in Wikimedia Commons needed to press the “go” button before the upload would proceed. However, this change in workflow, while important, would distract from the more fundamental improvements required in order to fit within the budget.
Upon successful upload of good quality multimedia, a GLAM naturally wishes for their material to be used in Wikipedia articles. Often there are pre-existing lower-quality files of the same artworks etc. already on Wikimedia Commons that could be ‘swapped out’ with the new files that are better quality. This functionality is already provided for within the CommonsDelinker tool and therefore can be left outside the scope of this project without significant loss. Nevertheless instructions on how to use CommonsDelinker within the documentation of the GLAMwikiToolset is planned.
In the current Wikimedia Commons Special:UploadWizard step entitled “Upload” there are two buttons - one to “select media files to share [from your own computer]” and below it, in smaller font, one to “share images from Flickr”. There is no reason why other major sources of freely-licensed online multimedia could not also be included alongside Flickr - in this case from Europeana. While this would be of strategic value to Europeana (to encourage the re-use of material available on its website) this is a one-by-one upload system and therefore a different project to the GLAMwikiToolset.
Once the GLAMwikiToolset receives a certain degree of community support and technological stability, it is hoped that it will be linked to directly from the default “upload” page. Users would be provided the choice to use the Upload Wizard (which is designed for 1-50 files at a time) or to the GLAMwikiToolset (a.k.a. “mass upload”) which is designed for any number of files. However, this feature is more a matter of community acceptance than of software development per se, and can therefore be left off the requirements list for this grant application.
One of the key procedural elements of the GWT is the creation and uploading of metadata (a flat-XML file) which includes all the descriptions which will be associated with the multimedia. However, this information is not modifiable once it has been uploaded, frequently forcing a user who is unfamiliar with the metadata-needs of Wikimedia Commons to go back to the beginning to re-export their metadata and re-start the GWT process. Therefore, it is important to allow for the modification or “cooking” of the metadata once it is already online so it can conform to GWT needs. Some independent effort has already been undertake in this regard (e.g. “GWcook”) but this is out of scope for integration because validation should be server-side not part of the GWT.
Every Europeana search result includes several options in the sidebar: “View item at <institution>” allows the user to click through to the GLAM website that owns the item; “Cite on Wikipedia” produced mediawiki markup with a pre-filled {{cite web}} template; and “share” provides various social media links. This proposed additional feature would determine if the image had already been uploaded to Wikimedia Commons, and if so, to provide a link to “View item on Wikimedia”. Alternatively, if the file was not yet on Wikimedia (and had an appropriate license) to give the user the option to upload it there. This feature was not prioritised as it is separate to the GLAMwikiToolset.
As with any industry, there are several major software options for GLAM database software. Europeana has for the last couple of years attempted to build a default feature in these systems a ‘push to Europeana’ button which would easily export the contents in a Europeana compatible format. A ‘push to Wikimedia’ function would equally be very valuable. This was not prioritised on the basis that this has proved to be an extremely complicated and slow process. It is high risk and high reward but outside the scope of a one-year grant.
One of the key outcomes of the first-round of GWT development was a report, now available on Commons, of the requirements for usage and reusage of statistics for GLAM content. This was designed to feed in to WMF plans for development of Metrics infrastructure. It was requested that this grant could also write an equivalent report on GLAMs needs for metadata export from Commons. This would show what kinds of information GLAMs would like to extract from Commons after they had uploaded their own materials, in order to improve their own databases with the crowdsourced information (categories, translations, geotags…) that the Wikimedia community had created. However, this was not prioritised on the basis that, even though this is a desired outcome for the future, it is a separate feature from the GWT in this round of development.
The creation of a “partnership template” and an “institution template” is best-practice for any mass upload to Wikimedia Commons. Currently, users of the GWT still need to create these templates manually which therefore requires them to become familiar with MediaWiki template syntax - something the GWT was designed to avoid. However, because this feature requires creation of new templates on-wiki, it is a completely different task to mass-uploading and therefore requires a different project to undertake it.
This suggestion is based on the idea that a lot of work (and therefore time) for the user could be saved if the GWT could “suggest” what certain items in the metadata schema should be mapped to Wikimedia Commons’ templates. While this is true, this feature was not prioritised because the complexity of the mapping process should be greatly decreased by the other elements that are included in the grant application. If the process remains too difficult after the other elements are finished, the priority of this item could be revisited. |
Resources & Risks
[edit]Resources
[edit]This will be the second part of the GLAMwiki Toolset project. The project was originally set up and funded (in 2012 and 2013) by Wikimedia FR, Wikimedia UK, Wikimedia NL and Wikimedia CH and has strong support from the Wiki community and the GLAM-sector.
Within Europeana the relevant staff are:
- David Haskiya, Product manager (DivadH)
- Dan Entous, Lead Developer (Dan-nl)
- Liam Wyatt, GLAMwiki facilitator and GWT project manager (Wittylama) (a position formerly held by Àlex Hinojo [ Kippelboy ])
- Other staff and organisational infrastructure of Europeana are available resources as/when needed - notably UX Design and Data modeling but also other teams such as marketing and finance.
Non-Europeana resources are:
- Volunteer WIkimedians who were officially involved in the 1st round of development as part of the official "steering group", and are likely to provide significant input to future development include (but not limited to) user:Multichill, User:Fae, User:Kippelboy, User:Jean-Frédéric.
- Several user-groups within the Wikimedia community who are well placed to provide assistance and feedback throughout the development process - see the "community notification" section below, as well as the "Facilitate user group" budget item above.
Risks
[edit]The risks of this project are equivalent to those in any software development project that has multiple stakeholders.
In order to avoid extra requested items being added to the scope of the project after it has begun, we have listed all the known requests that are out of scope here in this grant application - under the "WON'T" heading and with an explanation for this decision. This makes it easier to explain why certain features will not be developed and also helps provide clear guidance for the team during the project.
- Cost overrun
Spending more than budgeted is always a risk in any project. We have attempted to address this by using 'three-point analysis' to estimate the most likely costs for each budget item, as well as drawing on the extensive software-development budgeting experience of Europeana.
- Financial insecurity
One of the complexities of the initial development of the GWT was the multiple funding sources - Wikimedia Chapters - for the project (and the changing nature of how they raised, and were allowed to spend, their budget due to WMF fundraising policy decisions). This meant that payments (both in terms of amount, and in timing) were not consistent with expectations. Applying for funding directly from the WMF increases the confidence of Europeana in reliable funding.
- Communication breakdown
A difficulty of the original GWT development was the frequently changing 'contact person' from various stakeholder groups. This slowed down and complicated communication lines. This risk is being addressed by having a specific Europeana GLAMwiki coordinator (Liam Wyatt), fewer financial stakeholders, as well as by having greater visibility for the tool in general (since the software is already 'live').
- Code review delay or rejection
A key difficulty of the original project was delays in receiving code review from suitably qualified people. This greatly slowed down development. This is being addressed by specifically identifying code-review as a necessary non-financial support element of the grant.
- Dissatisfaction of the volunteer community
It is possible that, despite everything, the eventual software that is produced is not accepted by the Wikimedia community (or, indeed, by GLAMs). This is the existential risk of the whole GWT enterprise but it is mitigated by having: a 1 year grant length; an existing product that demonstrates proof-of-concept as well as proof-of-community-acceptance; specific budgeting for community facilitation; demonstrable interest and support in the GWT within the GLAMwiki community.
Budget
[edit]Cost breakdown
[edit]Indicative costs in €/Euros per day for each type of role:
Roles in yellow are requests for payment with this grant.
Roles in orange will be paid by Europeana.
Total cost per role
[edit]- 100 Developer days @ €680 per day
- 3 Video producer days @ €600 per day
- 13 Product manager days @ €400 per day
- 20 UX Designer days @ €330 per day
- 10 Data modeller days @ €330 per day
- 3 Technical writer days @ €330 per day
- 17 Project manager days @ €240 per day
- 20 Facilitator days @ €240 per day
The daily-rate of the different roles listed here is calculated based on the average hourly salary of the relevant Europeana employees. As a project-based organisation that frequently accepts funding tied to specific objectives, Europeana frequently operates by accounting for its staff obligations in 1 hour blocks. Therefore, the costs listed here are not invented specifically for a grant application, they are simply the normal hourly salary of that person x 8 hours = 1 daily rate.
Total cost per activity
[edit]Grant section |
Prioritisation | Item | Role | No. of days | € per day |
Total cost | Notes |
---|---|---|---|---|---|---|---|
2.1.1.1 | Must | Identify major bugs in current system and fix |
Developer | 20 | 680 | 13.600 | See also the non-financial requirements |
2.1.1.2 | Must | Improved reporting | Developer | 5 | 680 | 3.400 | |
2.1.1.3 | Must | Live preview of how metadata will appear |
Developer | 14 | 680 | 9.520 | See also the non-financial requirements |
2.1.2.1 | Should | Prepare for structured metadata |
Developer | 10 | 680 | 6.800 | See also the non-financial requirements |
2.1.2.2 | Should | Accept CSV dataformat for input |
Developer | 10 | 680 | 6.800 | |
2.1.2.3 | Should | Improve User interface | Developer | 20 | 680 | 13.600 | See also the non-financial requirements |
UX Designer | 20 | 330 | 6.600 | ||||
2.1.2.4 | Should | Facilitate User group | Facilitator | 20 | 240 | 4.800 | |
Travel expenses | 6 European events |
500each | 5.000 | ||||
Wikimania | 2.000 | ||||||
2.1.2.5 | Should | Improve documentation and Integrate |
Developer | 2 | 680 | 1.360 | |
Technical writer | 3 | 330 | 990 | Potential to be sub-contracted | |||
2.1.3.1 | Could | Screencasts | Developer | 2 | 680 | 1.360 | |
Video producer | 3 | 600 | 1.800 | Potential to be sub-contracted | |||
2.1.3.2 | Could | Split value mapping of multi-value fields |
Developer | 2 | 680 | 1.360 | |
2.1.3.3 | Could | Support non-flat metadata formats |
Developer | 15 | 680 | 10.200 | |
2.1.3.4 | Could | Pre-set example mapping for EDM |
Data modeller | 10 | 330 | 3.300 | Potential to be sub-contracted |
Overhead cost | Product management | Product manager | 13 | 400 | 5.200 | 10% of combined Developer days (100) + UX Designer days (20) + Data modeller days (10) = 130 | |
Overhead cost | Project management | Project manager | 17 | 240 | 4.080 | 10% of total no. of all estimated days, excluding product manager = 169 | |
Cost for Europeana | €15.100 (~$17,100 USD) | ||||||
Request for WMF grant | €84.670 (~$96,100 USD) | ||||||
Total cost of project | €99.770 (~$113,200 USD) |
Amount paid by Europeana
[edit]- €15.100 (~$17,100 USD)
This is the cost of the Product manager, UX designer, and Data modeller - all roles which are marked in orange above.
Europeana will also assume the financial costs of administering the project within its existing organisational framework. This includes provision of office space, management oversight, payroll and other legal infrastructure necessary for undertaking a major software development project. It also includes subsidising other areas of the budget with existing Europeana funds - especially for travel. While difficult to quantify, this is not cost-neutral and represents the commitment of Europeana to the project and support of the goals of the GLAMwiki community.
Amount requested from the PEG program
[edit]- €84.670 (~$96,100 USD)
This amount represents the cost of the Developer, Facilitator, Project manager, Video producer, and Technical writer - all roles which are marked in yellow above.
Non-financial requirements
[edit]Aside from the grant money, Europeana seeks four assurances from the Wikimedia Foundation engineering team. These assurances are formal requirements and, without them, Europeana will not promise to undertake the project (if funded):
- Timely code review: One of the major difficulties in delivery of the first round of GLAMwiki Toolset development was ensuring timely code-review and lines of communication from Europeana to the Wikimedia Foundation technical staff. In recent months communication channels have improved greatly. Nevertheless, code review remains a key dependency of the successful delivery of this project that is outside the control of Europeana. Therefore, confirmation of Code Review allocation from WMF technical department or provision of an alternative method for being trusted to review our own code, would be a requirement. Related to this, but worth noting specifically, is the requirement for Security Review due to the fact that the GWT is fully integrated to Commons, not a 'Labs' tool.
- Allocate design support: Similarly, some elements of this project call for specific skills and knowledge of the Wikimedia Foundation technical staff and/or wider Wikimedia technical community. These include user-interface design, user-experience testing, API design, and Commons-Beta operation. Confirmation of resource-allocation from WMF design team to support this project as-needed would be most helpful.
- Ownership of technical dependencies: Development of the GWT will no-doubt identify some technical-dependencies and other software bugs which are beyond the control of Europeana to fix. Europeana can identify and triage them, but it would be the WMF's responsibility to address them (or, WM-DE in the case of Wikidata issues). Europeana cannot be held responsible for bugs and technical dependencies that are beyond the scope of this grant but which negatively affect its progress.
- Product ownership long term: Longer term, Europeana does not make assurances that it will remain responsible for maintenance of the GWT codebase. While Europeana expects and hopes to continue to work with the Wikimedia community, there must not be a formal expectation of code maintenance on its part. Europeana will, as part of this project's "Facilitate user group" budget item, attempt to harness community-developer support for the GWT. However, the WMF would need to promise product ownership in the long-term.
Discussion
[edit]Community Notification
[edit]Relevant community groups who have been aware of the development of the original tool, and have been involved (to greater or lesser degrees) in the creation of this application include:
- GLAMtools mailing list (the user-group of the GLAMwikiToolset)
- Cultural Partners mailing list (the wider GLAMwiki community)
- Commons bureaucrats noticeboard (the group who operate the user permissions for the tool)
- Europeana’s “Wikimedia Task Force” (the strategic working group within Europeana made up of representatives of Wikimedia and GLAMs)
Endorsements
[edit]Do you think this project should be selected for a Project and Event Grant? Please add your name and rationale for endorsing this project in the list below. Other feedback, questions or concerns from community members are also highly valued, but please post them on the talk page of this proposal.
- On behalf of Wikimedia UK which is keen to see this happen and sees this as fitting well with our own GLAM program, as well as being of benefit to the movement generally including for content supplying partnerships that are unrelated to GLAM. Jonathan Cardy (WMUK) (talk) 15:23, 13 February 2015 (UTC)
- On behalf of Wikimedia NL, as well as a former employee at a large archive I think improving the tool is vital for it to have the impact it could potentially have. Cultural institutions are keen to get involved with their collections and expertise. This tool gives them a chance to do so much more easily. 85jesse (talk) 15:48, 13 February 2015 (UTC)
- yes, please, this is a perennial must have for GLAMs. we need improved tools for mass uploads. Slowking4 (talk) 20:51, 13 February 2015 (UTC)
- My experience with GLAM institutions as a Wikipedian in Residence suggests to me that 1) there is currently great interest in contributing to open public data image repositories like Wikimedia Commons and 2) if Wikimedia Commons cannot compete in terms of usability, it will eventually be superseded by something that works better, that will acquire that content. Right now, Wikimedia has an advantage. I'd like to see GLAM initiatives like this one supported because I believe that they are very important for the future of the Commons. Mary Mark Ockerbloom (talk) 01:29, 14 February 2015 (UTC)
- A tool that is highly needed by cultural institutions and definitely needs further development. This tool enables cultural institutions to easy upload free licensed files, with no easy alternative available. - publisher newsletter This Month in GLAM & board member of Wikimedia Belgium - Romaine (talk) 11:33, 14 February 2015 (UTC)
- I did quite a few uploads to Commons (about 2 million) and I'm happy to see that the GWToolset helps to spread the load. I was on the steering group for the 1.0 version and would love to see it being improved to a 2.0 version. Multichill (talk) 15:10, 14 February 2015 (UTC)
- Wikimedia District of Columbia plans on making significant use of the GLAMwiki Toolset in the coming years, and we consider the further development of this software to be crucial for our mission. harej (talk) 22:07, 16 February 2015 (UTC)
- Wikimedia Sverige have long term plans in supporting organisations to upload their media collections to Wikimedia Commons and believe this is a strategic step in making that easier, thus in the end actually freeing resources from the movement. Jan Ainali (WMSE) (talk) 08:04, 17 February 2015 (UTC)
- I was on the steering group for the 1.0 version and would love to see it being improved to a 2.0 version. The audiovisual archive I work for my day job has used the current tool with great success. But the suggested improvements would allow our non-technical staff to also use it. --Mbrinkerink (talk) 16:37, 17 February 2015 (UTC)
- Definitely would be a huge plus to see improvements to the GWT. Key issues for me would be improvements to usability. PatHadley (talk) 16:19, 14 February 2015 (UTC)
- As a mediawiki developer, I think further development of this tool will significantly benefit Commons. Bawolff (talk) 00:17, 18 February 2015 (UTC)