Please prepare a brief project update each month, in a format of your choice, to share progress and learnings with the community along the way. Submit the link below as you complete each update.
Paper about the underlying engine: Frey J, Hofer M. Hellmann S, Obraczka D, DBpedia FlexiFusion Best of Wikipedia > Wikidata > Your Data. ISWC Ressource Track 2019 (submitted). Available at: https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf
Continuing improvements of the first deployments, which will be an ongoing process. Especially the GFS Data Browser is being worked on:
users can now insert any Wikipedia URL into the subject search field
overall layout improvements
reference information is being added
Johannes Frey presented the GFS project at Wikimania
We created a news page within our Meta-Wiki project page framework for volunteers to keep them in the loop and encourage exchange. So far this has lead to three more volunteers signing up for our 'GFS Feedback Squad' and two users leaving feedback about our sync target study.
identifying and testing ways to generate lists of the Wikipedia articles related to selected topics: categories, infoboxes, Wikidata queries and other articles (lists).
In the last month output of our project was quite invisible as we 1. worked a lot on the data 2. had to deal with corona and all its consequences like missing child care. On the good side, we have quite a lot of budget (9000€) left and would like to stretch the project for four months like a budget-neutral extension. We still need time until end of September 2020. Project-wise we found this dump: enwiki-20200401-wbc_entity_usage.sql.gz
- Tracks which pages use which Wikidata items or properties and what aspect (e.g. item label) is used.
So we see it realistic to provide the following:
- We have one of the best infobox parsers and we have full information about all properties there. This means we can produce a reliable Wikidata adoption report, which show how much Wikidata is adopted, where it is well adoption in Wikipedia and where it can be improved.
- We can use this to calculate "good imports" from Wikipedia to Wikidata, i.e. where data in WP infoboxes is especially plentiful and well referenced, but missing in Wikidata
In addition, we started to index authoritative datasets that are often referenced in WP and WD. Taking this data from the source, we can build an interface, e.g. a user script to suggest relevant data points from these data sets to users for inclusion. This part might be experimental, but it would work like this: On https://pl.wikipedia.org/wiki/Pozna%C5%84
Populacja (30.06.2019)
• liczba ludności 535 802[3]
[3] is the population count from stat.gov.pl holding the official census for Poland. If this gets updated, we might be able to autodetect that a change is required either in the infobox or on Wikidata (that is up to the community policy).
This will not be complete, but it will probably work for 10-50 million entries in Wikipedia and Wikidata, depending on the quality of the source and how official it is.
In the next few month we need to work on the following topics:
- incorporate demo (hard-coded) references view into GFS browser using the novel JSON references dump
- GFS browser features
- include mapping management to allow search for properties of new external sources
@Juliaholze: Hi Julia, thanks for this request and context over your remaining budget as well as the disruptions you experienced due to the pandemic. We can appreciate that work on the project needed to be paused in order to focus on other, more important priorities, as we have experienced these same needs at the Wikimedia Foundation as well. This extension until 30 September 2020 to complete the above activities is formally approved. Your final report will be due on 30 October 2020. I JethroBT (WMF) (talk) 21:25, 6 July 2020 (UTC)
@JethroBT (WMF): Hi Chris, many thanks for your reply. We will complete the above activities and tasks.
We would like to request another budget-neutral extension. The main reason is very similar to the previous one. We are currently in the process of adding many authoritative datasets to the GFS browser, which will then enable to have "official" data from the appropriate sources to be included into Wikipedia/Wikidata. In the next two months we need to work on the following topics:
GFS browser features
include mapping management to allow search for properties of new external sources
Please also see our email to the WMF Grants Administrator.
Since the beginning of December 2020 we deal again with corona and all its consequences like a national lockdown and missing child care. I am sorry to inform you that we need more time to finish our
final report for the GlobalFactSyncRE project. We already started to write the report and we requested bank statements to document all expenses. We need more time to summarize all project results and document the outcome. We hope that you and your families are safe and well, despite the disruptions and consequences of covid. Kind regards, Julia