Content Partnerships Hub/Metabase/Metabase for the global movement/pl
Content Partnerships Hub
Improving the Wikimedia movement’s work with content partners
Metabase for the global movement
Background
In this case study, we report from our endeavor to fill Metabase with sample data about the resources published and the activities held by the Wikimedia movement, with a particular focus on content partnerships. The primary goal is to evaluate whether the structured data format is a suitable tool for storing this type of information.
Additionally, we wanted to examine and evaluate the modeling structures we had developed when working with Wikimedia Sverige's own data. While working with data produced by others is difficult without external input, we hoped to at least get an idea of the challenges involved.
For general information about how Wikibase is structured and why it was created, see the case studies Setting up a Wikibase from scratch and Metabase for chapter data.
As stated in Wikimedia Sverige's application for the Movement Strategy Implementation Grant, one of the goals of Metabase is to make it easier for everyone to identify and locate available material about content partnerships. For years, affiliates and individuals alike have been sharing their experiences and learnings in different forms and on different platforms, both within the Wikimedia ecosystem and beyond it. There's a lot of interesting material out there: reports, blog posts, newsletters, slide decks, posters and video recordings. The strength of the GLAM-Wiki movement is that it encompasses a wide variety of experiences, skills and voices. There are no obstacles to sharing your work in whatever form you find most suitable.
On the other hand, it can make it hard for others to find your resources and learn from them – which is why capacity building is a central point of the Content Partnerships Hub that Wikimedia Sverige is working on establishing. In order for affiliates and volunteers all around the world to be able to build stronger content partnerships, they must be able to learn from each other; a small, newly-created chapter should not be forced to reinvent the wheel while there's a plethora of resources created by more established chapters out there to tap into. Our vision for the Content Partnerships Hub is that it should facilitate the flow of knowledge among affiliates and volunteers, making it easy for everyone to both share and benefit from those resources. And in order to achieve this, we need technical infrastructure that's both flexible and easy to use.
Part of Wikimedia Sverige's contribution to capacity building on a global scale is the Content Partnerships Hub Helpdesk, an infrastructure to provide hands-on-support to Wikimedians planning and executing content partnerships, especially for local communities that are currently underserved and underrepresented in the Movement. Wikimedians can submit questions and requests to the Helpdesk, and the Hub staff will either provide the help needed or put the requester in touch with someone who could assist them. The work of the Helpdesk is directed by an Expert Committee, consisting of experienced members of the Movement with a variety of backgrounds. We see Metabase as a natural extension of the Helpdesk by adding material that has been identified when responding to a specific request, evaluating and analyzing what capacity building material exists and what is missing or needs updating etc. We want Metabase to become a place where everyone can search for – and contribute to – the global library of Wikimedia resources on their own.
With the data stored in a linked, structured knowledge base, it's possible to search it according to your own needs. You can, for example, query for links to Youtube tutorials about Wikimedia Commons in Swedish, or slide decks from presentations on library partnerships. Incoming Helpdesk requests could be stored in Metabase as well, making it easier for the community in general and the Expert Committee to get an overview of what has been done and what material has been produced by the Helpdesk specifically to fulfill the requests.
Scope
The Movement material in scope of Metabase encompasses:
- Conferences and other events (seminars, edit-a-thons, campaigns etc.) about Wikimedia related topics;
- Presentations and other contributions, such as panel participation, by Wikimedians and/or on Wikimedia topics, in events not organized by the Wikimedia movement;
- Material produced as a result or in connection with with the above, such as slide decks, posters, reports, video recordings;
- Publications, such as articles, blog posts, tutorials – in both text and video form – on Wikimedia topics.
Limitations
It should be kept in mind that our goal with the initial development of the Metabase content has not been to fully cover any particular subject area. Our staff resources and time are limited, so we had to make a decision about the direction of our work that could bring the most benefit to the project. One alternative could have been to attempt to focus on one particular area, research it in depth and provide a full coverage. It was definitely an attractive alternative – who doesn't like exploring one specific topic? – but it would mean we wouldn't be able to present a nuanced view of what Metabase could be. Our goal with the project is to explore the possibilities of the platform and experiment with different topics before we invite everyone in the movement to build on the foundations and expand them.
For this reason, we prioritized covering a broader set of examples more shallowly than to cover a few in-depth. We hope that this allows us to showcase the opportunity space better. Besides, we would not have been able to research the details of the work done by Wikimedians globally, as we are just a couple individuals with extremely limited language skills, which prevent us from familiarizing ourselves with the vast amounts of work done beyond the Anglosphere. Due to this, there will be gaps even within the areas we chose to specifically focus on. Our ambition has been to provide a starting point for the Movement with plenty of examples to make it as easy as possible for anyone to pick up where we left off and continue developing the content.
We also hope that our work will be discussed and criticized, so that the final shape of Metabase is a collaborative effort by the global community. We have built a foundation but in the end, a comprehensive knowledge base will require continuous work from the community.
Method
We tested two approaches to filling Metabase with data.
The first approach was event-focused. We selected two GLAM Wiki conferences (GLAM Wiki 2023 in Uruguay and GLAM Wiki 2018 in Israel) and input the information about the events and activities that took place during them. The reason for working on these particular events is that they combine an international scope with a focus on collaboration with cultural heritage institutions, which is in line with our vision for Metabase as a resource for content partnerships. We assumed we could find many relevant presentations and documents there.
The second approach was topic-focused. We chose OpenRefine as a focus topic. This open-source software is being used widely by the Wikimedia community for uploads and editing on both Wikidata and Wikimedia Commons. The Content Partnerships Hub Helpdesk, Wikimedia Sverige's support infrastructure for the Movement, regularly receives requests from volunteers and affiliates that can be fulfilled using OpenRefine, so we are aware there is a great need among Wikimedians to learn to use the software in different contexts.
Apart from Wikimedians, OpenRefine is used by data journalists, scientists and information professionals – and more. Since the software has a broad range of applications in several communities, a lot of information resources have been created, and we believe it is worth the effort to collect them in one place, to facilitate knowledge exchange between communities. The resources take a broad range of forms, from help pages on the Wikimedia platforms to blog posts, Youtube videos, presentation slides and scholarly articles.
Another aspect of OpenRefine that makes it an interesting topic for this case study is that the Wikimedia Commons features are relatively new, leading to a lot of interest from Wikimedians wishing to start using it for file upload and SDC work. In order to do that, they need to be able to locate and access the available resources, which is exactly what Metabase sets out to facilitate.
Event data input
The workflow of inputting data from a multi-part event, like a conference, is as follows:
- Locate the conference program.
- Create an item for the conference.
- Create items for each session.
- Create any items necessary to describe the details of the session, such as the person(s) and organization(s) involved, the language of the session, or the main subject.
- Link the session items to the conference item using part of / has part(s).
- Create an item for the slide deck used, if any, and link it to the session item using uses / used by.
See, for example, the session A missing piece of the puzzle: Providing direct support for content partnerships through the Helpdesk at the Content Partnerships Hub at the GLAM Wiki 2023 conference.
The session format
What is a conference session? Intuitively, we first assumed that every session described in the conference program would be a discrete presentation. However, this turned out not to be the case. Several independent presentations by different speakers can be grouped in a thematic session under a common title. This session will have one entry in the conference program. This is a typical format for lightning talks, but is not limited to them. Due to this, we decided on a session model:
- A conference consists of several sessions.
- These are linked using has part(s) / part of.
- A session can be either a hybrid event, an in-person event or an online event. This means that every session has two instance of statements.
- A session consists of one or several specific activities. For example, a session can contain three separate presentations, by different speakers, on the same topic. What brings them together is that they are grouped as one session in the program. This is modeled using has part(s) of the class and the qualifier quantity.
- Example: Wikidata for cultural heritage, which contains three presentations. Compare the description in the conference program.
- A session can have one or several speakers (people who present) or leaders (people who facilitate a practical activity, like a workshop).
Topical data input
In order to input data about resources related to a particular topic, the resources have to be identified. The following sources were used:
- Wikimedia Commons categories OpenRefine slide presentations and OpenRefine video presentations.
- OpenRefine/Presentations on Meta.
- Programs of the major Wikimedia conferences, such as Wikimania and Wikidatacon.
- Google searches.
Due to our own limitations and bias, the majority of the resources identified were in English.
Results
Events
The conferences GLAM Wiki 2023 and 2018 were input into Metabase. The two conferences cover 146 sessions. 100 unique index terms (keywords) were used to describe the topics of the sessions.
Showcase SPARQL queries
Topics
As of June 2024, there are 74 items with a main subject = OpenRefine statement. A large number of those are events, such as conference sessions. 46 of them are different types of published documents, including mostly slide decks from different presentations, but also a number of video recordings, tutorials and blog posts. The majority of them are in English, with a small number of resources in Swedish and other Western European languages, which reflects our limitations when locating the resources – and makes it clear how important it is for more participants from different backgrounds to contribute – as our assumption is that other resources indeed do exist in other languages. An additional benefit of using Metabase to survey the available resources is that it will make it easier for everyone to notice patterns – which languages are over-represented, and which ones under-represented, in relation to the number of Wikimedians who might need them? – and provide a groundwork for prioritizing the creation and translation of resources in the most needed languages.
We can use the fact that many of the resources have multiple main subject statements to examine what topics are most frequently co-occurring with OpenRefine. Those include, not surprisingly, Wikidata, Wikimedia Commons and upload. We imagine that as our data grows, we will be able to gain interesting insights about different co-occurring topics.
Since published documents have a publication date, it's also possible to plot them over time. This enables us to quickly see which resources are the oldest, and thus potentially outdated – we might not want to refer to them when advising someone about the most relevant learning material. Being able to see the most recent resources is useful for those who want to catch up on the newest functionalities in OpenRefine, such as Wikimedia Commons integration.
Challenges and considerations
In general, expanding our scope from Wikimedia Sverige's own data into resources from the global Wikimedia movement was a challenging but also interesting experience. It gave us an opportunity to reflect on the current practices of knowledge management within the Wikimedia movement; a necessity if we want to improve it.
The following issues became apparent during the work:
Data quality
Data degradation has proved to be an issue, especially when researching conferences and other events. This was relevant when compiling the material on OpenRefine: in order to describe a slide deck from a conference presentation, we need to input at least basic information about the event.
The further back in time we go, the higher the probability that the original conference program has been moved from its original website, or even deleted altogether; it is not always given that a snapshot taken at an appropriate time is available in the Internet Archive. While events organized by the Wikimedia movement, and documented on one of the wikis, are relatively easy to research, those arranged by other actors can require more digging – especially if the program was published in a non-standard format, or only available to logged-in participants.
Having said that, it does not mean that events organized by the Movement are always easy to model. Different conferences present their programs in different formats; different yearly editions of one conference are not necessarily consistent. Crucial information, such as the language of the session (in multilingual events) or the affiliation of the speakers might not be immediately visible. Also, post-conference documentation, such as slide decks and video recordings, is not always easy to find and link to the specific sessions.
Scope
When working with OpenRefine material, the question of scope became apparent.
The fact that multiple discrete communities are using and sharing information about the tool makes it an interesting case, and was indeed one of the reasons why it was selected as a focus topic. Some Wikimedians might not be aware of the resources provided by the library community, and vice versa, but manuals on e.g. data editing with GREL and Jython can be useful to all users, regardless of the scope of their work.
At the same time, it should be noted that this variety of available resources, produced by both Wikimedians and non-Wikimedians, did force us to reflect on the exact scope of Metabase. Yes, some educational material created with non-Wikimedians in mind is of great value to the community; an article on GREL can be valuable to any of us even if it doesn't mention Wikidata at all. But where to draw the line? There's also the question of how we should approach items such as books and scholarly articles, which are in scope of Wikidata. It might be enough to store very basic information on them in Metabase, to indicate that they exist and are about a relevant topic, but the actual detailed bibliographic information should be offloaded to Wikidata. As we mention in Setting up a Wikibase from scratch, for items that are in scope of both Metabase and Wikidata, we want to have a clear understanding of which of the projects should be "responsible".
Breadth vs. depth
When conducting the work, we had to strike a balance between describing things in such a detailed way that it's possible to provide all the information that we imagine is relevant and using our limited staff resources in an efficient way. While it's natural to try to research every single item in depth, the goal was, as mentioned previously, not to fully cover the available material, but rather to provide a broad range of examples of the sort of information Metabase could hold.
Topics
Looking at the purpose of Metabase – facilitation of locating relevant Movement resources – topics are in the center. Without accurate topic tagging, it will be impossible for users to sort through the material and identify what they are looking for. At the same time, we often ran into problems when trying to add keywords to presentations and the like, especially those from outside our areas of expertise. While many conference programs have topics assigned to the sessions, they are often broad, such as collaboration or GLAM. In order to identify more informative keywords, like Creative Commons licenses or art museums, you have to read the session description, which we obviously only could do if the session in question was within our area of expertise and we could understand the description.
Lack of external contributions
It should be noted that many of the challenges listed here stem from the fact that we were working on a proof of concept. As mentioned previously, we do not expect Wikimedia Sverige to actually fill Metabase with data all by ourselves. We have prepared a platform, a foundation on which the community can build upon. Hopefully, with some more users from outside of our organization, these problems will be resolved organically.
At the same time, we are aware that developing the initial structure of Metabase completely on our own creates certain limits. We have had a small number of people working on it, on and off, for a short period of time, which carries a risk that the modeling solutions we have developed will not prove usable to other members of the Movement. It is not only until they have tested contributing to and using Metabase, as well as given feedback on its structure and vision, that we can know it actually is a useful project.
Conclusions and ideas for future work
The idea of collecting the resources created by the community is not new. For example, the Wikimedia Foundation made an attempt to aggregate the information about events and initiatives contained in the This Month in GLAM Newsletter. We are aware of and impressed by this information collection initiative, and we have started investigating ways to collaborate and make use of the data, so that it becomes accessible to more people through Metabase.
There's also been an initiative to collect the information about the different tools used in different stages of GLAM content partnerships projects, conducted as part of the preparatory work for establishing the Content Partnerships Hub. There have been attempts to catalog the many tools the Movement uses as well, such as the Toolhub and Hay's Tools Directory. Affiliates around the world have their tools and knowledge repositories, such as newsletters, wikis and blogs. All the conferences and other events we organize, from Wikimania to local meet-ups, are described in different places.
In other words, there's plenty of area to both make the work of knowledge collectors easier and set up knowledge seekers for success. We imagine that Metabase can become a hub where all these sorts of information can be collected on a shared platform, accessible to all. Information collectors in our Movement do amazing work in less than perfect conditions, with the plethora of platforms and tools available. And collecting the information is only the beginning – it is not actually useful unless everyone interested is able to quickly and easily find what they are looking for.
However, for the platform to succeed, it is necessary that as many people as possible contribute to it. Our work with data input to date has been experimental and exploratory – we wanted to test several approaches to the data collection, from both an event and a thematic point of view. Wikimedia Sverige – or any one affiliate – does not have the skills or resources necessary to track all the activities of affiliates and volunteers around the world. We hope that we have shown that Metabase is a platform worthy of investing time and effort into, and we are looking forward to assisting those who would like to do that.