Grants talk:IEG/Editor Behaviour Analysis
Add topicMore specifics
[edit]What questions will be answered? What are the methods that you will use to answer them? How do you differentiate your work from stats.wikimedia.org?
I suggest that you pick a few analyses from past research that you thought were very interesting and could be important if updated, then propose update and extend those studies with automated reports. You've already done some work to updated and extend R:The Rise and Decline. I'd suggest looking at the following next:
- Wikipedian Self-Governance in Action: Motivating the Policy Lens, http://www.aaai.org/Papers/ICWSM/2008/ICWSM08-011.pdf
- Specifically, the graph showing diffusion of policy citation from active, core editors towards newbies.
- Creating destroying and restoring value on Wikipedia, http://www.katie.panciera.net/PriedhorskyGROUP2007.pdf
- Specifically, the graph showing the probability that any random pageview will see a vandalized version of an article.
I see a clear way that I can help you with those two should you choose to pursue them. There are likely many other bits of past work that incorporate a visualization component and could be very interesting, but my pre-coffee brain can't recall at the moment. --Halfak (WMF) (talk) 13:10, 29 September 2015 (UTC)
Some of the questions I'm currently focusing on
[edit]- How are the active edit sessions in a month split across the different editor cohorts? How do the new comers compare with the rest of the editors cohorts (In percentage & value terms)?
- How are the bytes added in a month split similarly?
- How has the longevity/retention rates changed for editor cohorts over time?
- How do the above change across the different languages?
- How do the above change as we go from active to very active editors?
- And what happens when we start looking at these from the articles perspective? Article longevity, article edit activity etc
- What are the articles the new comers work on? Are they editing older established articles or newer ones?
- What are the articles being vandalised & who are the vandals?
- How does the edit activity across a category look like? Are the articles edited mostly by experienced editors?
- What are the articles being edited on mobiles & VE, Who are the editors using them?
Some other ideas I'm thinking of - Research:Editor Behaviour Analysis & Graphs/Ideas. I'll collate all of them soon.
Methods
[edit]I'll be using visualization techniques like the ones I've already used for the initial set of graphs to create the visualizations. They will all be interactive & allow the user to filter the visualization with a metric that is relevant to the specific visualization. Together we should be able to find our answers or at least directions we can further investigate.
These Visualizations Vs stats.wikimedia.org
[edit]- Many of these visualization show the split in activity by editor cohorts. Stats shows only gross number for a month.
- These visualizations are interactive and allow the user to filter the data, most of them on stats are static.
- The charts on stats and the proposed visualizations here(some have already been built) are very different and complement each other.
Hi Halfak (WMF), 'Creating destroying and restoring value on Wikipedia' looks interesting. Especially the PWV. Let me do some more reading before I get back to you on this. Are there other metrics/papers you want me to look at reverts, flagged revisions etc?--Jeph paul (talk) 13:40, 4 October 2015 (UTC)
Usability and usefulness as measures of success
[edit]Note: I've spoken with the proposal author about this off-wiki already, just posting my comments here to make it all official. These graphs are intended to allow all sorts of researchers explore data of editing trends over time. As such, it is important that these tools be easy to use for people with many different backgrounds, levels of expertise, and with levels of experience with wiki-research. The proposal author currently lists "50 active users" as a measure of success. The proposer's previous IEG project, ReplayEdits, also had an adoption-related success measure. To help achieve adoption, the proposer should plan to conduct some sort of evaluation of the tool with a set of users that are similar to the intended audience. I suggest that the proposer build user studies into their project plan, so that he can evaluate his designs and improve upon them based on evidence and feedback from real users. I also suggest that he include a measure of success related to demonstrating that the tool can be used for its intended purpose by the people its designed for--that users can explore data successfully, and interpret it correctly. I've volunteered to advise the proposer on how to design appropriate evaluation measures in order to demonstrate success according to these measures. And I believe that he can demonstrate success--his current prototypes are promising, and he has good track record of building useful and usable software. Jtmorgan (talk) 18:42, 1 October 2015 (UTC)
Hi Jtmorgan, I have added getting user feedback through the length of the project as one of the activities.Nothing detailed yet though. How do we quantify the 'usefulness of the tool' as a measure of success for its intended users? Do we do a survey after the tools have been built? Check if anyone has cited the tools in their research? Your help is most welcome, thanks. --Jeph paul (talk) 13:53, 4 October 2015 (UTC)
- A survey would be appropriate. You could also run some user studies with researchers, and include a questionnaire at the end of the study, in which you could ask questions like "How does this tool compare to other tools you have used to visualize editor behavior trends?", "Would you recommend this tool to other researchers? Why or why not?". By the way, per your "50+ active users" metric. How do you plan to track the usage of the tool? Cheers, Jtmorgan (talk) 17:41, 6 October 2015 (UTC)
- For the replay edits project I use GA(Google Analytics) to track usage, it is deployed on github. I'll have to figure out something else for toollabs.--Jeph paul (talk) 17:15, 7 October 2015 (UTC)
- Sounds sensible. Make sure to check out the Labs terms of use, esp. as it relates to the tracking usage of labs-hosted tools. Yuvipanda or DAndreescu can probably answer any questions you might have. Cheers, Jtmorgan (talk) 19:24, 7 October 2015 (UTC)
- I really like this idea of better understanding editors and the motivations. The more the veil can be pulled back the better. Geraldshields11 (talk) 21:07, 16 October 2015 (UTC)
- Sounds sensible. Make sure to check out the Labs terms of use, esp. as it relates to the tracking usage of labs-hosted tools. Yuvipanda or DAndreescu can probably answer any questions you might have. Cheers, Jtmorgan (talk) 19:24, 7 October 2015 (UTC)
- For the replay edits project I use GA(Google Analytics) to track usage, it is deployed on github. I'll have to figure out something else for toollabs.--Jeph paul (talk) 17:15, 7 October 2015 (UTC)
Eligibility confirmed, round 2 2015
[edit]This Individual Engagement Grant proposal is under review!
We've confirmed your proposal is eligible for round 2 2015 review. Please feel free to ask questions and make changes to this proposal as discussions continue during this community comments period.
The committee's formal review for round 2 2015 begins on 20 October 2015, and grants will be announced in December. See the schedule for more details.
Marti (WMF) (talk) 02:18, 4 October 2015 (UTC)
Open source?
[edit]I suspect the answer to this question is "of course", but just for due diligence: do you intend to release the source code for this tool under an open license and post it (with documentation) in an open online repository? If so, this info should be added to the proposal, especially since one of your sustainability claims is "The visualization techniques being explored and developed in these graphs can be reused in other projects by researchers & individual editors." Cheers, Jtmorgan (talk) 17:43, 6 October 2015 (UTC)
- Yes the python scripts to generate the data for the visualizations and the js, html & the css needed to render the visualization will all be available on an open license on github - https://github.com/cosmiclattes/wikigraphs.
- There are some auxiliary datasets I'll be generating (Eg: the month of first edit of every editor etc). I use them in intermediate steps in the process of creating the visualizations. These datasets are big, 200+mb especially for 'en'. They will reside on the toollabs server but I haven't yet thought about exposing them for general use.--Jeph paul (talk) 17:03, 7 October 2015 (UTC)
Aggregated feedback from the committee for Editor Behaviour Analysis
[edit]Scoring criteria (see the rubric for background) | Score 1=weak alignment 10=strong alignment |
(A) Impact potential
|
7.2 |
(B) Innovation and learning
|
7.2 |
(C) Ability to execute
|
7.4 |
(D) Community engagement
|
7.0 |
Comments from the committee:
|
Round 2 2015 decision
[edit]Congratulations! Your proposal has been selected for an Individual Engagement Grant.
The committee has recommended this proposal and WMF has approved funding for the full amount of your request, $1,000
Comments regarding this decision:
The Committee supports your work to deepen understanding of how editors contribute to Wikimedia projects. We appreciate your efforts to create automatically updating visualizations that will have ongoing value beyond the life of your grant. We look forward to discussing possibilities for concrete applications of your research going forward.
Next steps:
- You will be contacted to sign a grant agreement and setup a monthly check-in schedule.
- Review the information for grantees.
- Use the new buttons on your original proposal to create your project pages.
- Start work on your project!