Grants:IEG/Wikiscan multi-wiki/Midpoint
This project is funded by an Individual Engagement Grant
Welcome to this project's midpoint report! This report shares progress and learnings from the Individual Engagement Grantee's first 3 months.
Summary
[edit]In a few short sentences or bullet points, give the main highlights of what happened with your project so far.
The project is in good way, I consider the main objective (multi-wikis) successful but it needed more time than expected to be efficient.
Methods and activities
[edit]How have you setup your project, and what work has been completed so far?
Describe how you've setup your experiment or pilot, sharing your key focuses so far and including links to any background research or past learning that has guided your decisions. List and describe the activities you've undertaken as part of your project to this point.
I work alone so there is not much organization needed, I count time spent on each tasks, I use a local git repository. I have started by focusing on most important features: multi-wikis and new home pages.
Midpoint outcomes
[edit]What are the results of your project or any experiments you’ve worked on so far?
Please discuss anything you have created or changed (organized, built, grown, etc) as a result of your project to date.
- Wikiscan is now extended from one wiki to more than 300 biggest Wikimedia wikis, this was the main goal of this grant and the biggest challenge. The beta version is publicly visible on wikiscan.org since October 10.
- The Wikimedia Labs database is directly used to pull raw data for all wikis, connections use compression to reduce bandwidth.
- The workers system is working well, I improved it more than I initially thought by adding a master worker which run little worker units.
- There are two new status pages to monitor wikis updates, one big table with detailed informations for each wiki [5] and one for the master worker [6].
- The new global homepage display history graphs for wikis (users/edits/pages with different colors for users/IP/bots). Order is based on a score calculated internally, key factors are number of contributors and wiki size, bot edits count less.
- New homepage for each wiki with several new charts.
- I have done a lot of restructuring on internal statistics calculations.
- User stats have been optimized for big wikis, in particular for English Wikipedia.
- Multi-language support and English translation are in progress.
-
Global home page [1]
-
Wiki home page (English Wikipedia) [2]
-
Wikis updates status [3]
-
Master worker status [4]
Finances
[edit]Please take some time to update the table in your project finances page. Check that you’ve listed all approved and actual expenditures as instructed. If there are differences between the planned and actual use of funds, please use the column provided there to explain them.
Then, answer the following question here: Have you spent your funds according to plan so far? Please briefly describe any major changes to budget or expenditures that you anticipate for the second half of your project.
I have spent more time on several core functionalities :
- Worker system needed more improvements than I initially thought, +5 hours to add a master worker and +3 hours for it status page (lines 3 and 4).
- Global home page and wiki home page: those are visual page very important for the visitor, it take a long time to choose what to display, search for graphics library or build mine, make the graphics and page layouts, etc. +2 hours for global, +4 hours for wiki home (lines 5 and 6).
- Restructuring and additions of statistics: I have spent more time on this essentially to provide global statistics to new home pages in an efficient way, +4 hours (line 17).
- English Wikipedia scaling for user stats needed a lot more optimizations to keep the server run smoothly, I used 7 hours for this that was not planned (new line 19).
The first part included the most difficult and important tasks, especially the scaling to 300+ wiki with support for English Wikipedia. The second phase should be easier with less unexpected extra time. Also if an important feature require much more time, some secondary improvements can be delayed for another grant.
Learning
[edit]The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you are taking enough risks to learn something really interesting! Please use the below sections to describe what is working and what you plan to change for the second half of your project.
What are the challenges
[edit]What challenges or obstacles have you encountered? What will you do differently going forward? Please list these as short bullet points.
The main challenge was to transform the site designed for a single wiki into a 300+ multi-wikis site. The processing of English Wikipedia was particularly difficult because it is much larger than any other wiki, Wikiscan was designed for French Wikipedia which has 700 k users, enwiki is ten times larger. I try to stay on an affordable server, actual server cost are € 40/month.
- User statistics calculations was not scaling well to English Wikipedia. I knew that the process would be slower but it was consuming too much server resources and slowing other wiki updates. I had to try several improvements on user stats tables. The main optimization was to allow updates in small chunks to keep data in memory, avoiding the creation of big temporary table on disk (the server has only one effective SATA drive).
- The initial worker system was working but there was some limitations. Each worker was run from the command line, picking up next wiki to update inside an infinite loop. It was easy to add new workers but difficult to follow which wiki was currently being updated and to find a particular update in the log files. It was also difficult to stop a worker without interrupting his current task. The addition of a master worker which run small worker units when needed was very useful, the maximum number of units running at the same time can be changed at runtime, each unit have his own log file named according to the wiki and the update type. On the new master page we can see which wiki are updating, the previous updates and the next ones. The master worker also collect stats for each worker type and wiki size, this was useful to measure benefits of optimizations and find the best maximum numbers of running units.
- The project plan is very detailed but it is hard to predict how much time I will spent on each tasks. Sometimes I have to rapidly switch from one task to another because they are dependent of each others. If I do another grant with a lot of software development, I will try to regroup more similar tasks together.
What is working well
[edit]What have you found works best so far? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.
I have spent more time than initially expected on optimizations but I think it was really worth it, the server run smoothly now, even with 3 big wikis updating at the same time. The main "recent update" for English Wikipedia was initially set every day for all users, now it is twice a day, all other wikis are updated every 2 hours.
Next steps and opportunities
[edit]What are the next steps and opportunities you’ll be focusing on for the second half of your project? Please list these as short bullet points. If you're considering applying for a 6-month renewal of this IEG at the end of your project, please also mention this here.
- The first community feedbacks on French Community are encouraging [7] [8] [9].
- Wikiscan was first designed for Wikipedia, I need to improve support for other Wikis :
- Use Mediawiki "ContentNamespaces" to include extra namespaces as main namespaces.
- Improve bot detection with a global bot table.
- Home page for Commons should display stats on files instead of articles.
- I don't plan to immediately renew this grant at the end of the project but I do plan on applying for another grant to further improve the site after a little break, maybe in September 2017.
Grantee reflection
[edit]We’d love to hear any thoughts you have on how the experience of being an IEGrantee has been so far. What is one thing that surprised you, or that you particularly enjoyed from the past 3 months?