Jump to content

Visual Analytics for Sustainability and Climate Change/DMP

From Meta, a Wikimedia project coordination wiki

Visualizing sustainability and climate change on Wikipedia Project Activities DMP Documentation Credits

Data Management Plan (DMP) / Research Data Plan (RDM) of the project Visual Analytics for Sustainability and Climate Change.

Last update: November 2024, Iolanda Pensa

Guiding Principles

[edit]
  • Open by default: as open as possible, as closed as needed
  • Easy to find, cite and reuse
  • FAIR data principles
  • CARE data principles
  • Proportionality: only use data you need
  • Delete unnecessary data
  • Ensure the security of the data which needs to be protected
  • Respect the communities and practices of the Wikimedia projects, inform them and involve them in relevant decisions

Policies and requirements

[edit]

The project "Visual Analytics for Sustainability and Climate Change: Assessing online open content and supporting community engagement. The case of Wikipedia" is promoted by SUPSI, supported by SNSF and involving the Wikimedia and Creative Commons communities. It refers to the following policies:

Terms included in the project

[edit]

The research project Visual Analytics for Sustainability and Climate Change contributes to the Wikimedia projects; this requires the interoperability of the research project data with the Wikimedia projects. The default license of Meta-wiki and Wikipedia project pages is CC BY-SA and Wikidata uses the open tool CC0.

  • Research content will be released by default with the double license CC BY-SA and CC BY; data will be released in CC0.
  • The authors of the interactive visualization tool will have all non-exclusive rights on the tool and the tool with its code will be released with an open license.
  • Data is collected, produced and stored according to the FAIR and CARE principles.
  • The project whenever possible uses open and libre software.
  • Partners and participants in the interviews will have to sign a consent to release their interviews anonymised under CC0 or attributed under CC BY 4.0; participants in the interviews will be able to decide if they want their recordings and transcript to be stored privately or publicly or eliminated at the end of the research.
  • Datasets are associated with a DOI on Zenodo.
  • The project documentation is also made accessible on Media-wiki, OSF Open Science Framework, and the SUPSI repository

Wikimedia recommendations

[edit]

Relevant contacts: Wikimedia Research https://research.wikimedia.org/

Data produced and collected during the different phases of the project

[edit]

The main data collection needed for the project is related to the selection of articles. The selection of articles relies on existing lists created by the Wikimedia communities.

WPs / Activities Data produced

(primary data)

Data collected

(secondary data)

Ethical issues, privacy, security, copyright issues
0 Project design and management
  • Project description
  • Project budget
  • Administrative documents
Bills

Data related to external collaborators

  • Administrative files and data about collaborators need to be private and protected.
  • The research needs to be notified to the Wikimedia Foundation.
1 Definition of the dataset of articles to be investigated
  • Set of articles in four languages
  • Clusters of articles
  • Data related to the coverage of the selected articles on Wikipedia in Spanish, French and Italian
List of articles and Wikidata identifiers of WikiProject Climate Change, a collaboration area for volunteers interested in improving coverage of climate change on Wikipedia in English. 4350 articles in March 2024.

Bibliography / references. Data related to other visualisation tools.

Working with the Wikimedia communities implies to have all the documentation accessible and transparent and to inform the related communities.
2 Definition of user requirements
  • Five semi-structured qualitative interviews with researchers in the fields of sustainability and climate change and volunteers interested in research related to Wikipedia articles
  • One online focus group of 1 hour and a half with six representatives of the institutions “Wiki in Africa”, “Wikimedistas de Uruguay”, and the international project “Open Climate Campaign.”
  • Discussions with the online communities of Wikipedia take place through Wikipedia talk pages related to sustainability and climate change
  • 2 online meetings with online communities of Wikipedia
  • Report which describes the user requirements shared with the communities on Media-wiki

We could also implement a survey which is not included in the project description but could be useful.

  • Informed consent, attribution, data management and license collected from participants in the interviews, focus groups and meetings.
  • The project description already states "Partners and participants in the interviews will have to sign a consent to release their interviews anonymised under CC0 or attributed under CC BY 4.0; participants in the interviews will be able to decide if they want their recordings and transcript to be stored privately or publicly or eliminated at the end of the research."
  • All members of the research need to state in their user page on the Wikimedia projects that they are involved in the project (COI).
  • In involving the communities in the discussions it is necessary to clearly state the research project, its purpure and how content collected will be used.
3 Implementation of the visualisation tool
  • A set of online documents which integrate the materials from the other WP and support workshop activities
  • A participatory design workshop online involving the entire research team with a plenary session and 3 working groups. The workshop lasts around 2 hours and uses online boards to brainstorm and sketch visual interfaces and solutions
  • Report of the participatory design workshop
  • Visualisation tool implemented through the use of open web technologies: three releases
  • Informed consent, attribution, data management and license collected from participants (see above)
4 Data analysis and validation of the visualisation tool
  • Production of the form with a set of questions about the overall experience
  • Feedback from the first user tests of the visualisation tool collected through the form
  • Report on the first fixes
  • Feedback from the second user tests of the visualisation tool collected through the form
  • Report on the second fixes
  • Collaborative online analytics workshop involving the entire research team with 3 working groups and a final plenary session, to analyse the data emerging from the visualisations. It lasts around 2 hours
  • Report of the collaborative analytics workshop
  • Paper about the quality of Wikipedia articles related to sustainability and climate change
  • Reports about the use of the tool by institutions
  • One-day edit-a-thons organised in three different locations
  • Two online “Writing Weeks”
  • Report and data from online monitoring tool for events (Wikimetrics and Programs and Events Dashboard), performance metrics, participant observation, feedback and semi-structured qualitative interviews.
  • Including in the form informed consent, attribution, data management and license collected from participants (see above)
  • Also for the online writing weeks and the edit-a-thon it is important to notify the research project and collected informed consent
5 Impact Evaluation
  • Qualitative interviews to all partners to evaluate the impact of the tool in decision-making, support in designing strategies and communicating with stakeholders
  • Report about the engagement of volunteers and their feedback, based on data analysis, comparative analysis, performance metrics, participant observation and analysis of talk pages and events
  • Paper
  • Documentation uploaded on Meta-wiki
  • Discussions with the communities about the results of the project

We could also implement a survey which is not included in the project description but could be useful.

  • Informed consent, attribution, data management and license collected from participants (see above)
  • For the discussions online it is important to notify the research project
6 Research dissemination
  • Submission to International Conference on Information Visualisation Theory and Applications (IVAPP) 2028
  • Submission to FOSDEM Free Open Source Developers’ European Meeting’ in Brussels February 2026
  • Submission to DARIAH (Digital Research Infrastructure for the Arts and Humanities) international meeting
  • Submission to DARIAH-CH study day
  • Submission to Graph Conference
  • Submission to Wikimania 2026 in Paris
  • Submission to Wikimania 2027
  • Paper about climate change and the results of the visualisations
  • Paper about the tool and its scalability (possible journals First Monday, the Journal of Open Humanities Data, New Media and Society, and Social Media and Society)
  • Data from Wikidata need to remain in CC0
  • Important to give back to the communities (also related to CARE principles): if new or improved data are produced, they need to benefit the communities; papers need to be notified to the communities.
  • Paper need to include the correct credits with all authors and the correct credits in the correct order.
  • All papers open access.
  • Data accessible and associated to the papers.

Management of the data

[edit]
Data, documentation Size estimate Software, formats License, terms Ethics Temporary storage Collaboration with online open communities Preservation plan
Privacy, confidentiality Criticality (0-3) Necessary actions Wikimedia Meta-Wiki Wikidata Wikimedia Commons OSM OSF (with DOI) Zenodo (with DOI)
Project description < 1 gb Meta-wiki https://meta.wikimedia.org (MediaWiki open and libre software), .doc, .pdf Double license CC BY-SA and CC BY Not including budget and administrative data. Research team members may want to be removed from the team along with their data/mentions 1 Removing budget Onedrive SUPSI Complete project Visual Analytics for Sustainability and Climate Change d:Q130394984 OSF
Administration (budget, contracts, bills...) < 1 gb .doc, xlsx. © Confidential, restricted access, to be delated. 3 To be delated SUPSI servers or Onedrive SUPSI
Selection of articles < 1 gb Wikidata and Wikipedia project pages (open and libre software) CC0 / Data deposited on Zenodo with DOI
Discussions on the Wikimedia projects (talk pages and project pages) < 1 gb Meta-wiki and Wikipedia and Wikidata pages (MediaWiki open and libre software) CC BY-SA Public discussions signed by participants with their usernames Wikimedia projects
Survey < 1 gb Data collected in CC0 Possibility to provide an email and to be recontacted. Data of the participants/reviewers managed accordingly. Data need to be aggregated and anonymised before making them accessible Report of the survey in CC BY and CC BY-SA on Meta-wiki.

Data stored on OSF.

Data deposited on Zenodo with DOI.

Co-creation workshop < 1 gb Onsite and online with BBB BigBlueButton or Jitsi (conference tool - open and libre software) Texts and images in CC BY-SA and proposals and ideas in CC 0 with attribution to the members of the team in the page “credits” Agreement with the participants on the license and the attribution. Report of the event on Meta-wiki. Images on Wikimedia Commons. Data stored on OSF.

Data deposited on Zenodo with DOI.

Interactive Visualization Tool: API integration and the data visualization system < 1 gb Server: Debian GNU/Linux stable GNU Affero General Public License v3+ (GNU GPL v3+) or MIT https://opensource.org/license/mit/ The tool does not store personal data
Code of the Interactive Visualization Tool < 1 gb Git repository Wikimedia GitLab  https://gitlab.wikimedia.org/
Documentation of the interactive visualization tool < 1 gb / Double license CC BY and CC BY-SA Documentation on WikiTech
Data used and produced by the interactive visualization tool < 1 gb / Data in CC0, visualizations in CC BY /
User tests at the end of the two releases of the interactive visualization tool < 1 gb Participants contribute online on MediaWiki (open and libre software) using their usernames Texts and images in CC BY-SA. Ideas and proposals in CC0 with attribution to the members of the team in the page “credits” Participants who want to provide confidential feedback can send messages which are stored privately on SUPSI servers or SWITCH folders Report on Meta-wiki.

Data stored on OSF.

Data deposited on Zenodo with DOI.

Data of the participants / reviewers < 1 gb Participants contribute online on MediaWiki (open and libre software) using their usernames / In case of personal information related to participants and reviewers those are managed privately on SUPSI servers or SWITCH folders Wikimedia projects for usernames.

SUPSI servers or SWITCH folders for private information

Events to improve articles (challenges, online thematic weeks, campaigns, edit-a-thon…) Participants contribute online on MediaWiki (open and libre software) using their usernames CC BY-SA and CC0 for Wikidata Contributions signed with usernames. Wikimedia projects (mainly Wikipedia, Wikimedia Commons and Wikidata)
Wikimedia Conferences - Conference papers / Slides, associated data and recordings in CC BY and CC BY-SA (default license of Wikimedia conferences) / Zenodo with DOI https://zenodo.org, Wikimedia Commons and Wikimania website
Conferences - Conference papers / Slides, associated data and recordings in CC BY / Zenodo with DOI https://zenodo.org
Reports / CC BY and CC BY-SA / Meta-wiki and Zenodo with DOI https://zenodo.org
Scientific articles / CC BY or CC BY-SA / Gold or Diamond open access publications. Articles deposited also on SUPSI repository and Zenodo.

Methods

[edit]
Method Instruments Procedures Quality measurement
Interactive visualisation
Qualitative interviews
Survey

Files and folder naming and formats

[edit]
  • Files are named using the date, the version (if necessary), the initials of the author (if necessary), subject type and subject.
  • The version of the file is provided by the date, the version (if necessary) and the author (if necessary)

List of articles

[edit]

Collaborations and feedback

[edit]

Files providing informations about interviews, transcriptions and notes are named as

  • date [year-month-date]-initials of the author[es IP]-interview-name of the interviewed
  • date [year-month-date]-transcription-name of the interviewed
  • date [year-month-date]-notes-name of the interviewed

Reports

[edit]
  • date [year-month-date]-interview-

Metadata

[edit]

Minimal metadata provided: author/s, license,

Tools and repositories used

[edit]
Tool/repository Description Strenghts Critical issues Use within the project Safety/privacy (0-3)
Wikimedia Meta-Wiki The website of the Wikimedia communities used for projects and content multi-linguistic and multi-project. Under CC BY-SA by default Open for collaboration

Default license CC BY-SA

We use Meta-Wiki to present and document the project and to publish updates and resources 0
Wikidata The repository of structured open data of the Wikimedia Communities. Under CC0 Open for collaboration

CC0 The largest existing open data repository with linked open data and structured data. Largely used and makes data very accessible and easy to reuse.

The default license for research - and also for data - is CC BY (not CC0)

Items and data from research projects may not be considered by the Wikidata communities relevant.

It may be deleted by the community

We use data from Wikidata and we enrich Wikidata with our data. In particular we need to cluster content and we will do it on Wikidata 0
Wikimedia Commons The repository of multimedia files of the Wikimedia communities. Content is published under Public domain, CC0, CC BY, CC BY-SA or similar. Makes images very visible and easy to be reused. 0
Zenodo Research repository managed by CERN and financed by the European Union. The most well-known generic research repository. FAIR repository. https://zenodo.org/ Designed for research outputs

Permanent archiving (and reliable promoter/sponsor)

Consolidated database for open access and open data

Allows to generate DOIs

Very suitable for publications

Guided process to produce metadata

Possibility to create a community related to your research project

Not a repository specifically for research in the humanities

Generic repository

You can upload data in folders

1
OSF Open Science Framework Research repository organised by project. It is managed by the Center for Open Science. It is designed to document projects but also to collaborative work. FAIR repository. https://osf.io Designed for research

Organised by project Allows to generate DOIs It is possible to create restrictive access to projects

Less well-know than Zenodo

More complicated than Zenodo to add metadata

We use OSF for the project by creating a specific project 1
Wikimedia GitLab
GitHub
Toolforge the Wikimedia Foundation hosting service for community tools https://admin.toolforge.org/ / https://wikitech.wikimedia.org/wiki/Help:Toolforge/Developing_successful_tools
LimeSurvey an open and libre software for surveys https://meta.wikimedia.org/wiki/LimeSurvey
Microsoft OneDrive
Files and folders on SUPSI servers
Files and folders on SWITCH
Google drive - free service
Social media SUPSI
Calls on Teams
Calls on BBB BigBlueButton
Conferences and webinars on BBB BigBlueButton
Files and folders on personal computer
Files and folders on external hard drive

Backups

[edit]

Risks and mitigation plans

[edit]
Risk Mitigation plan

Attribution

[edit]