Grants:IEG/WikInfoboxer
Project idea
[edit]What is the problem you're trying to solve?
[edit]Wikipedia is centered around collaboratively creating and editing articles for a variety of topics and subjects. The information in these articles is often split into two parts: 1) unstructured text with details on the article’s subject and 2) a semi–structured infobox that summarizes the most important facts about the article’s subject. Infoboxes are not only a great way of summarizing the most important information for users but also for software applications. Thus, infoboxes are usually preferred by systems using Wikipedia content (such as Google’s Knowledge Graph) as they are easier to process by machines.
A common problem when creating an infobox for a Wikipedia article is determining which information is important enough to appear in it. To help editors with this, the current creation of Wikipedia infoboxes is based on templates that are created and maintained collaboratively. While templates provide a standardized way of representing infobox information across Wikipedia articles, they pose several challenges. Different communities use different infobox templates for the same category articles; a template designed for a specific category of articles is used for other different categories and its attributes are miss-understood; attribute names differ (e.g., date of birth vs. birthdate), and attribute values are expressed using a wide variety of measurements and units. Finally, templates are free form in nature; when users fill attribute values no integrity check is performed on whether value is of appropriate type for the given attribute, often leading to erroneous infoboxes.
Guiding contributors in the creation of infoboxes would mean creating richer and more correct information. Therefore, it would not only help Wikipedia but also all the systems based on this information from products (e.g., Google’s Knowledge Graph) to research tools (e.g., all the research projects consuming data from DBpedia).
What is your solution?
[edit]As part of our research, we have been working on a system to help contributors define infoboxes for Wikipedia articles. We are applying our previous experience on Semantic Web technologies, data mining, and recommendation to this problem. As a result we designed the Infoboxer system, which is a tool grounded in Semantic Web technologies that overcomes challenges in creating and updating infoboxes, along the way making the process easier for users. We proposed to use statistical and semantic information extracted from the Linked Open Data (LOD) dataset DBpedia to infer the most popular attributes used to describe pages of a given Wikipedia category. With this information, we can generate an infobox “template” automatically. Also, for each attribute we propose to identify the most popular expected types (e.g., for the property “birth place” the most popular values correspond to “settlements” and “countries) and to provide users with suggestions for the actual values to introduce (e.g., “USA”, “Germany”, “Spain”, are “countries” that could be used for such an attribute). This information inferred about expected types and values can be used to enforce semantic constraints on the values entered by the user (e.g., it would prevent editors from entering a link to a person for the attribute “birth place” of a page). In addition, it also can be used to provide links to existing pages in Wikipedia.
We have developed a simple research prototype of Infoboxer to test our ideas (http://sid01.cps.unizar.es/). However, as a research prototype, it is not ready to be used by Wikipedia editors yet. First, as the purpose of the prototype was showing the feasibility of our approach, we did not focus on challenging problems that should be solved before making the tool available to editors. For instance, the software architecture and technologies used should be reconsidered to be able to: 1) help creating infoboxes for all the categories in Wikipedia, 2) support multiple users of the tool creating infoboxes, 3) provide an intuitive and user friendly graphical interface, 4) be robust against failures. In summary, this project would enable us to transform a simple research prototype in a useful tool ready to be used for editors to create infoboxes for Wikipedia pages.
Project goals
[edit]High-level goals:
- Increase the number of Wikipedia pages with infoboxes and the amount of data in Wikidata.
- Improve the quality of infoboxes as well as the amount of interesting information defined for each infobox and for each wikidata entry.
- Make the creation of content (infoboxes )easier for Wikipedia and Wikidata editors.
Specific goals:
- Study mechanisms to efficiently store and query structured data to infer interesting information that should be included in an infobox.
- Analyze and redesign the architecture of our existing research prototype to consider usability and scalability issues.
- Develop and implement the new architecture.
- Adapt WikInboxer for the creation of WikiData content.
- Evaluate the developed tool with users with different backgrounds and experience creating content for Wikipedia.
- Promote WikInfoboxer as a software tool that helps Wikipedia contributors to create high-quality Infoboxes.
- Present the techniques developed for WikInfoboxer to the scientific community as a mechanism to populate knowledge bases with instances.
Besides, if WikInfoboxer is accepted by the community of Wikipedia editors we will adapt the tool to be easily integrated in Wikimedia projects. We would like to remark that we believe this project will benefit:
- Wikipedia editors to create quality content for the inboxes easily, leading to improved articles.
- Systems based on extracting information from infoboxes such as Wikidata, research tools (e.g., DBpedia), and even commercial products (e.g., Google’s Knowledge Graph).
- Researchers in the area of Semantic Web who could use the ideas developed for WikInfoboxer in tasks such as the population of knowledge bases.
Project plan
[edit]Activities
[edit]Identifier | Months | Title | Description |
---|---|---|---|
T1 | 1 | Technology Review | 1)Study of the state of the art for storing and querying structured data efficiently (for example, using RDF HDT).
2)Study recommendation techniques to improve the suggestion of values for the infobox the user is creating. |
T2 | 2-3 | Design architecture | 1)Redesign the architecture of the research prototype Infoboxer by taking into account scalability issues. The tool must support the creation of infoboxes for every Wikipedia category, multiple users interacting concurrently, and different source information (DBpedia, WikiData) |
T3 | 3 | Design user interface | 1)Adapt the current user interface to consider usability issues. The interface should help non-experienced editors to create infoboxes easily.
2)Create a Web responsive interface to make the tool available for a wide range of devices (from desktop computer monitors to mobile phones). |
T4 | 2-5 | Implement main components | 1)Implement the different architectural and interface components designed in Tasks T2 and T3, respectively.
2)Adapt the tool to facilitate its integration with the projects of Wikimedia Foundation, specially with Wikipedia and Wikidata. |
T5 | 4-6 | Evaluate WikInfoboxer | 1)Evaluate the performance of the tool under different circumstances (for example creating infoboxes for different categories, stress tests, etc.)
2)Design different user tests to evaluate the usability and accessibility of the tool. 3)Perform tests involving users with different backgrounds (coming from different communities) and experience in Wikipedia editing (new editors, experienced editors, etc.) |
T6 | 5-6 | Encourage adoption of WikInfoboxer | 1)Present the tool in differents events such as, lessons, seminars, and editathons (e.g., within the University of Zaragoza).
2)Write research papers to show the scientific community the techniques developed and encourage them to use the tool. |
T7 | 3-6 | Community outreach and training | 1)Inform users about project results. Provide developer support/documentation. Gather feedback on design choices. |
Budget
[edit]- Project Management: 0€
- Roles: Overall management and direction; designing architecture; managing tests with users, etc.
- Software development costs: 12,500€
- This is a stipend based on the following calculation: 25 weeks (6 months), 20 hours/week, 25€/hour
- Roles: Design and implementation of the new architecture
- Conference travel: Promote WikInfoboxer in the scientific community
- ISWC 2016 - travel (900€), conference registration (750€), accommodations (550€): 2,200€
- WWW 2017 - travel (900€), conference registration (750€), accommodations (550€): 2,200€
- Organization of events: Promote WikInfoboxer through Editathons and/or competitions.
- These events will enable us to evaluate the tool and obtain feedback from the participants. Also, it would help encouraging people to become WikiMedia users and contributors.
- We estimate an amount to cover the reservation of spaces and promotion of the event: 900€
- Total budget: (Amount) EUR: 17,800€
Sustainability
[edit]Our purpose is to continue working on Infoboxer to develop WikInfoboxer both in the SID research group where all members involved in the project are working and in the wider WikiMedia community. Thus, the availability of the query service demonstrator can be assured beyond the end of the project. The whole project will be initially hosted in the web servers of the SID group where adequate resources to support 30 concurrent users are available. Nevertheless, we would like to host WikInfoboxer on Wikimedia Labs if the project is successful in order to leverage Wikimedia integration. So, we want to stress that we will work within the resource utilization guidelines provided by Wikimedia Labs administrators. The applicants' own interest in the project is based on their interest in working with structured data sources in the research areas of Semantic Web and Data Management where they have been involved for 10 years. Besides, several activities to leverage the community involvement in this project have been planned:
- Code will be developed with re-usability and maintainability criteria in min. Thus the documentation of the code and manuals and the publication of tutorials and examples are scheduled.
- Code will be completely open and will be available to download or fork in the github repository.
- The project will be presented to wider communities in international conferences focused on semantic web and data management issues.
- Students in the Computer Science Degree of the University of Zaragoza will be encourage to participate in the development and maintenance of the project.
Measures of success
[edit]Need target-setting tips? Note: in addition to your project-specific measures of success, you will also be asked to report on some Global Metrics at the end of your final report. Please keep this in mind as you plan, and we'll support you as you begin your project.
End of the project:
- Launch WikInfoboxer.
- At least 60 infoboxes created by means of WikInfoboxer.
- At least 20 Wikimedia users using WikInfoboxer.
Mid of the project:
- Involvement of at least one researcher in compact structures to speed up the response-time of WikInfoboxer.
- Positive evaluation of the tool of at least 10 non-technical users (volunteers) that will have to create at least two infoboxes: one by using the current interface of Wikipedia and one by using WikInfoboxer.
Get involved
[edit]Participants
[edit]Grantee Roberto Yus will be the co-project manager for WikInfoboxer. Roberto is a Post-Doctoral fellow in the Computer Science and Systems Engineering Department of the University of Zaragoza, Spain. His research interests include the Semantic Web and the development of tools leveraging it. He has been involved in the development of the Infoboxer research prototype. He has participated in several Wikipedia editathons.
Co-Grantee Raquel Trillo_Lado will be the co-project manager for WikInfoboxer. Raquel is an associated professor in the Computer Science and Systems Engineering Department of the University of Zaragoza. She is also the co-leader of the Representation of structured data sources working group (WG1) of the COST action Keystone: semantic KEYword-based Search on sTructured data sOurcEs (IC-1302). Raquel has participated in the organization of several events related to Wikimedia: WikinformaticA en Aragón (2015), Donostia_San Sebastián WikiData Editathon, Editathon Wikimañas and WikinformaticA en Aragón (2016).
Co-Grantee Ismael Rodríguez will be the software developer for WikInfoboxer. Ismael is an undergrad student at the University of Zaragoza. He has been involved in the development of the research prototype Infoboxer and has experience in the development of plugins for MediaWiki.
Community Notification
[edit]We have notified the following communities in order to get their support and feedback:
- https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Infoboxes
- https://en.wikipedia.org/wiki/Help_talk:Infobox
- https://en.wikipedia.org/wiki/Category_talk:WikiProject_Infoboxes_participants
- https://en.wikipedia.org/wiki/Help_talk:Designing_infoboxes
- https://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style/Infoboxes
- https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Templates
Endorsements
[edit]Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project in the list below. (Other constructive feedback is welcome on the talk page of this proposal).
- Community member: add your name and rationale here.