Toolhub/Progress reports/2021-10-22
Report on activities in the Toolhub project for the week ending 2021-10-22.
Toolinfo schema v1.2.0 marked as stable
[edit]MusikAnimal noticed that toolinfo schema v1.2.0 was still marked as unstable ("draft02"). It has now been promoted to stable both in the on-wiki documentation and in the Toolhub git repository. Future toolinfo schema changes will increment the version number as is semantically appropriate for the change.
Bug fixes
[edit]- backend: Add setting to toggle off urllib3 warnings
- logging: Add service.type to ECS log events
- crawler: Normalize toolinfo name before seen checks
Figure out how to deal with duplicate toolinfo records
[edit]The topic of duplicate "projects" has been discussed this past week. Duplicates can appear for various reasons. Some are caused by Toolforge publishing a toolinfo record for a tool that has also created and published a toolinfo.json file itself (both crawler managed). Others are an example of a crawled toolinfo.json record (likely from Toolforge or a tool that scrapes an on-wiki listing) and a toolinfo record created directly in Toolhub. It is also completely possible to have duplicates that are both created directly in Toolhub.
Multiple authors
[edit]Samwilson has asked how to best model multiple authors in a toolinfo record. A current practice used by Toolforge is to provide a comma separated list of names in the author field. A phabricator task has been created to track the desire for a better long term solution. This may also lead to implementing a more expressive 'user' object of some kind so that authors and other user based information in a toolinfo record can provide more detailed information such as real name, Wikimedia user name, Developer account name, and contact information.
Content ownership/modification model
[edit]The current content ownership/modification model used by Toolhub has been documented in our decision record. This model is a topic of discussion related to community editing of records. The Toolhub 1.0 "minimum viable product" has chosen to make editing a tightly controlled process with only the creator of a toolinfo record and users with advanced patrolling rights able to edit records. Wikimedian's are used to more permissive editing models with patrolling providing controls against misinformation and vandalism. Opening up the ability to edit API created toolinfo records is technically simple, but past concerns about sensitive data such as software license and suitability of current patrolling workflows should be examined for additional requirements before implementation.
Growth of toolinfo in week 1
[edit]When Toolhub launched on 2021-10-14 there were 1472 records in the system. These came from various toolinfo.json URLs previously registered with Hay's Directory and new content scraped from enwiki. As of this writing on 2021-10-22 the directory contains 1631 records, a 10% increase. The bulk of these new records come from work by Magnus Manske to ensure that all of his tools (including old, obsolete, and unmaintained tools) are documented. 22 new toolinfo records have also been created directly in Toolhub by users.
Wrap up
[edit]The Toolhub team has been excitedly watching talk pages, Phabricator tasks, mailing lists, and error logs this week. Reception by the community thus far is positive. Users are interacting with Toolhub to explore it's data. Some are exploring the boundaries of the current implementation and providing feedback about how it met their expectations and ideas for future improvements.
The team will be meeting in the coming week to talk more about what is next for Toolhub and the team itself. Check back next week for more details about those future plans.