User talk:Halfak (WMF)/Archive
Add topicWelcome to Meta!
[edit]Hello, Halfak (WMF)/Archive. Welcome to the Wikimedia Meta-Wiki! This website is for coordinating and discussing all Wikimedia projects. You may find it useful to read our policy page. If you are interested in doing translations, visit Meta:Babylon. You can also leave a note on Meta:Babel or Wikimedia Forum (please read the instructions at the top of the page before posting there). Happy editing!
-- Meta-Wiki Welcome (talk) 15:47, 25 September 2013 (UTC)
Love your work
[edit]Hi Aaron. Thanks very much for all your great work. I find your research endlessly fascinating and enlightening. I was just reading through User talk:Halfak (WMF)/New page creations, deletions, and drafts and was surprised that the number of pages being kept was rising a little over the last few years. I would have expected that they would be dropping with the ever increasing quality requirements enwp has. This is really good stuff and important to have facts like this in order to make informed descisions about how the project runs. I'm really very happy to have you researching these various areas. I was just wondering if your work is included in any of our newsletters or what-have-you. I seem to find it by chance and would dearly love to be kept up-to-date on your latest findings if they are published somewhere. Thanks again for all your great work. 64.40.54.90 04:41, 26 November 2013 (UTC)
- Thanks! I'm glad that you're finding my work helpful. I appreciate your feedback and questions. My plan is to publicize the work widely when I've completed the study. However, I can now see that your point still stands with regards to my in-progress work. Besides transparency for its own sake, the point of putting my work on-wiki is to get feedback/questions/collaboration from others such as yourself.
- Right now, I'm working out the right wayTM to track my progress on these projects. See R:Module storage performance for an example project where I'm experimenting with not tracking progress in my user space. One of the things that I'd really like to figure out is a way to publish a self-updating recent activity list that tracks my work as well as the work of other researchers/analysts on meta (e.g. DarTar and Erik Zachte). (See R:L2 for an example of my work on a general space for tracking research activities and facilitating collaborations.) Once I've nailed down a structure that seems to work, I'll be trying to bring more visibility to in-progress projects. --Halfak (WMF) (talk) 15:11, 26 November 2013 (UTC)
"I'm glad that you're finding my work helpful."
Not just helpful, also incredibly valuable. Thanks. I was reading R:Module storage performance the other day and also the stuff in User talk:Halfak (WMF)/New page creations, deletions, and drafts/Archive too. I find your notes very helpful. I'm a data junkie and it's incredibly valuable to have real, honest-to-goodness data for understanding what's going on with the project. I didn't know about R:Labs2 and see that it's been around for a couple months. I see I have some catching up to do."I'll be trying to bring more visibility to in-progress projects"
Thanks very much. I would really appreciate that. 64.40.54.4 05:49, 27 November 2013 (UTC)- For the record, my activity log is here. It's not self-updating, but I can live with that. Erik Zachte (talk) 14:55, 1 February 2014 (UTC)
Hi Aaron, this grant proposal is now live. Please edit liberally with comments on the talk page and update any sections that you can complete. Thanks! --Pine✉ 18:08, 30 September 2014 (UTC)
- Great! I'll take a pass this evening (UTC -5). --Halfak (WMF) (talk) 18:33, 30 September 2014 (UTC)
Test
[edit]Test ping for EpochFail --Halfak (WMF) (talk) 15:45, 14 October 2014 (UTC)
Most edited
[edit]Hi, Aaron! I liked your work about most edited articles for English wikipedia very much! (https://meta.wikimedia.org/wiki/Research:Top_edited_articles_in_2014#Results). Could you prepare the same for Ukrainian Wikipedia? I want to compare and to publish an issue in Ukrainian blog. --A1 (talk) 14:37, 4 January 2015 (UTC)
- Hi A1! I'd be happy to. I've just kicked off the queries now. --Halfak (WMF) (talk) 16:30, 5 January 2015 (UTC)
- A1 Here's a quick view of the top 10 articles by edits. I can do a more substantial writeup like I did for English if you want to look at month-to-month. --17:02, 5 January 2015 (UTC)
Thanks a lot, Aaron! Great job! Yes, if possible it would be interesting to study month-to-month statistics. --A1 (talk) 11:54, 6 January 2015 (UTC)
ping A1. :) --Halfak (WMF) (talk) 20:58, 6 January 2015 (UTC)
- Thanks a lot! This was the publication in WMUA blog --A1 (talk) 09:43, 8 January 2015 (UTC)
- great! Happy to help. Let me know if you want to do something like this again next year. It would be nice to incorporate some of Brian Keegans methods too. :) --Halfak (WMF) (talk) 19:19, 9 January 2015 (UTC)
Wikitech
[edit]Dear Halfak,
where do you execute your metrics SQL?
For example this from Research:Top edited articles in 2014:
SELECT
LEFT(rev_timestamp, 6) AS month,
rev_page AS page_id,
COUNT(*) AS edits
FROM revision
WHERE rev_timestamp BETWEEN "2014" AND "2015"
GROUP BY month, page_id
ORDER BY month ASC, edits DESC;
--Kopiersperre (talk) 18:23, 1 March 2015 (UTC)
- Kopiersperre sorry for the delay. I missed your message somehow. I ran that query on the analytics servers we have at the Wikimedia Foundation for doing research and analytics work. However, you can run queries like this one on http://quarry.wmflabs.org/ via a public web interface. Let me know if you need a hand. You can find me in IRC in #wikimedia-research as "halfak". --Halfak (WMF) (talk) 00:42, 11 March 2015 (UTC)
Global user page
[edit]You should replace your local redirects by a global user page... Helder 23:43, 10 March 2015 (UTC)
- Woah! Cool! Thanks for the tip! --Halfak (WMF) (talk) 00:39, 11 March 2015 (UTC)
Request for a second pair of eyes
[edit]Please take a look at my note here and add any clarifications or questions as you think best. Thanks, --Pine✉ 23:53, 28 August 2015 (UTC)
- Hey Pine. I'm not sure what you'd like me to look at here. --Halfak (WMF) (talk) 15:51, 1 September 2015 (UTC)
- Sorry, I'll be more specific. I'd just appreciate a check of this analysis that I gave of the graph: "for the first 6 months of 2015, the highly active editor stats were consistently higher than the 2014 stats during the same months year-over-year, which is good news. Also, the English Wikipedia July 2015 number shows a noteworthy increase over the July 2013 number on this chart; another month of significant positive divergence from 2013 would suggest a trend that meaningfully exceeds the 2013 population statistics." --Pine✉ 18:37, 1 September 2015 (UTC)
- @Pine:. I'd have to perform my own analysis to check these numbers. I don't see a methodology to review or a suggestion of what the word "noteworthy" would mean. Generally, I don't want to believe this is good news until we see an improvement in social health indicators like retention. Right now, I suspect we have hit our population capacity given our current (bad) retention rates. But in order to know in a useful way, we'd need to commission a study that would take someone like me 2-3 weeks of fulltime work -- all just to know what we're looking at. We may never know the true cause. --Halfak (WMF) (talk) 20:45, 1 September 2015 (UTC)
VisualEditor May 2015 study - medium term results
[edit]Research:VisualEditor's_effect_on_newly_registered_editors/May_2015_study
It's been a few months, could you check the medium term survival and total productive edits for the Control and Experimental groups? Alsee (talk) 06:32, 22 September 2015 (UTC)
- It's on my backlog. I'm overloaded now with a few other things, but I should be able to get to it in the next couple of weeks. I'll ping here when I do. --Halfak (WMF) (talk) 17:49, 23 September 2015 (UTC)
- Hi Alsee! I completed a preliminary survival analysis of the VE experiment cohorts. It doesn't look like we're seeing a significant difference. See my notes here Research talk:VisualEditor's effect on newly registered editors/Work log/2015-09-30. --Halfak (WMF) (talk) 18:34, 30 September 2015 (UTC)
- Thanx for the update. I see the survival figures, but how about edits? In your previous analysis you actively filtered out data when people made a larger numbers of edits. That's the segment I'm trying to get at. One person who gets serious, learns what they're doing, and stays to make hundreds of expert edits is far more important than hundreds people making one newbie edit each. Alsee (talk) 21:23, 1 October 2015 (UTC)
- Hey Alsee. It seems likely that, if someone is still saving a substantial amount of edits 2-3 months into their time as a registered user, they aren't saving newbie edits anymore. Either way, it's somewhat problematic to measure the total amount of productive edits at long time scales where we are likely to see few users saving edits since outliers will have a large effect. But... a Wilcoxon test should still work since it's non-parametric. I'll do some tests, give it a try, and ping again. --Halfak (WMF) (talk) 22:17, 1 October 2015 (UTC)
- @Alsee and Halfak (WMF): I did a Wilcoxon test on the total number of edits in each bucket when I was originally doing this follow-up. Long story short, there was no significant difference (p = 0.857). I've been thinking about how I should post my work; Halfak, what do you think about me checking my Jupyter notebook and TSV files into a git repo and uploading to Github? I could then create a stub Research: page linking to it.—Neil P. Quinn-WMF (talk) 20:57, 5 October 2015 (UTC)
- Cool! Thanks for getting that done, Neil P. Quinn-WMF. +1 for putting the notebook and files on Github. It would be great if we could somehow have a ipynb-->wikitext translator or a ipyn viewer on MediaWiki. Some day. --Halfak (WMF) (talk) 16:06, 6 October 2015 (UTC)
At User Talk:Jimbo Wales
[edit]I mentioned you at User_talk:Jimbo_Wales because I think you are doing some of the things mentioned in the on-going discussion of quality issues. I'm certainly interested in what you are doing, but am not informed enough about it to ask intelligent questions yet. Smallbones (talk) 15:23, 23 September 2015 (UTC)
- Thanks for the ping. I've been a bit overloaded and am just leaving town now -- hence my delayed response. See Research:Measuring value-added and ORES for my recent work in measuring quality/productivity. I'll respond more substantially when I get back next week. --Halfak (WMF) (talk) 18:31, 25 September 2015 (UTC)
Enjoy your weekend
[edit]Thanks for your efforts. I hope that you get some nice R&R this weekend. --Pine✉ 20:51, 25 September 2015 (UTC)
- \o/ Thanks. It was a good time. Now back to the salt mines. ;) --Halfak (WMF) (talk) 15:21, 28 September 2015 (UTC)
- Really? Salt mines? /me stares at her whip... --Elitre (WMF) (talk) 15:31, 28 September 2015 (UTC)
- Ahh yes. The "knowledge mines" I suppose would be more accurate. Or maybe "data mines". ;) --Halfak (WMF) (talk) 15:52, 28 September 2015 (UTC)
- Really? Salt mines? /me stares at her whip... --Elitre (WMF) (talk) 15:31, 28 September 2015 (UTC)
Experiment defaulting anonymous editors to the visual editor
[edit]https://phabricator.wikimedia.org/T119269 Run experiment defaulting anonymous editors to the visual editor on the English Wikipedia
Hi. I assume this would be similar to the May_2015_study? Could you give me a ping when there's any documentation or start up on this? I'd be interested to follow the process and results. Thanx! Alsee (talk) 01:37, 25 December 2015 (UTC)
- Hey Alsee. Neil P. Quinn-WMF will be running this study. I'll be supporting though. :) I imagine the design will be similar to past studies, but I haven't talked to Neil about the details of what they have planned yet. --Halfak (WMF) (talk) 14:20, 29 December 2015 (UTC)
@Alsee: Yep, I'll be leading the enwiki anons experiment! I've (finally) started work on the study plan at Research:Visual editor for anonymous users, 2016. Comments are, obviously, welcome.—Neil P. Quinn-WMF (talk) 00:15, 10 February 2016 (UTC)
Research barnstar
[edit]I've seen you quoted, in more than one news article about Wikipedia 15, discussing the decline in the number of Wikipedia contributors and what can be done about it. --Pine✉ 19:07, 15 January 2016 (UTC)
Coffee
[edit]Thanks for your participation in the office hour about instructional video! --Pine✉ 03:31, 27 January 2016 (UTC)
- No problem. Happy to help where I can. :) --Halfak (WMF) (talk) 14:43, 27 January 2016 (UTC)
Growth of Wikipedia
[edit]Hi. I was reading about your research on the Signpost, and I wondered if you had any thoughts about this model on enwiki? It has only four parameters, but seems to have been doing a pretty good job over the last few years, fitting pretty much the entire history of both the English and German Wikipedias* and has shown clear predictive power: it predicted an eventual return to slow exponential growth, after a period of earlier deceleration, at a time a couple of years ago, when the permanent decline of Wikipedia's growth appeared to be the trend based on the "three phase" view of the development of Wikipedia.
- [* with the exception of the server slowdown and Rambot incidents on enwiki.]
-- The Anome (talk) 14:41, 28 January 2016 (UTC)
- Hey Anome, what signpost article are you referring to? Re. modeling Wikipedia's growth, I'm not sure that measuring the raw count of articles is an interesting outcome. I also don't see an exponential decay as "a return to slow exponential growth". There's no literature I know of that supposes that the "three phase" view implies a "permanent decline" of Wikipedia. The "three phase" graph you refer to (that I assume to be the one described at R:The Rise and Decline) isn't really a view so much as a language for discussing a reality. We can describe this reality with phases or not -- however you like.
- I think the real question -- the one I discuss in the literature and all of my talks -- has to do with the sudden drop and non-recovery of in the retention rate of good faith newcomers. This has substantial implications for coverage biases (e.g. maybe editors who would have written about certain subjects are among those no longer retained) and the long-term maintenance of the editing community. However, assuming that we hold stable at this lower survival rate, a population model would predict that we would eventually stabilize at a new carrying capacity. I think we're seeing that happen right now as English Wikipedia's decline has normalized. IMO, this is not a victory. We still need to find better ways to help good-faith newcomers find editing to be a rewarding experience -- as rewarding as it was to edit Wikipedia before we refocused towards quality control. --Halfak (WMF) (talk) 16:23, 31 January 2016 (UTC)
- Hi. I'm not questioning your research -- I find it really interesting, and I think your emphasis on the community is really important for helping ensure the long-term future of the curation and development of Wikipedia. Nevertheless, I think other metrics are still worth contemplating, as the encyclopedia paradoxically continues to improve and grow in spite of the worrying trends in community engagement. For example:
- Number of articles -- representing the number of topics Wikipedia covers
- Size of articles: up to a point, bigger is better
- Quality of articles: more is always better, but different kinds of quality (for example, clarity and technical accuracy) can often compete with one another
- Connectedness of articles: the web of connections is one of the most critical aspects of Wikipedia, unprecedented in earlier works.
- Hi. I'm not questioning your research -- I find it really interesting, and I think your emphasis on the community is really important for helping ensure the long-term future of the curation and development of Wikipedia. Nevertheless, I think other metrics are still worth contemplating, as the encyclopedia paradoxically continues to improve and grow in spite of the worrying trends in community engagement. For example:
- I think I understand what you mean to discuss better. Thanks for clarifying. One quick question. It seems that you are suggestions that wikipedia's continued improvement and growth is paradoxical. Yet, I think that the evidence suggests that new article creation rates are slowing and that edits rates are slowing. So where is this paradox? --Halfak (WMF) (talk) 18:16, 7 February 2016 (UTC)
Generally I would appreciate if the old research plots would be continued. The findings probably stay the same, but it would be very interesting to see the further development.--Kopiersperre (talk) 16:30, 15 February 2016 (UTC)
- Agreed. These things are difficult to keep up to date because they take so much time. I'd also really like to see an update to the desirable newcomer survival graph above, but that would require substantial effort put into reviewing new editors activities for recent years. Kopiersperre, would you be willing to help with such an effort? --Halfak (WMF) (talk) 17:04, 16 February 2016 (UTC)
- Of course. I use gnuplot for my plots on Commons. This enables possibly everyone to update from new csv data.--Kopiersperre (talk) 10:51, 17 February 2016 (UTC)
- Kopiersperre, oh. I use en:ggplot2. If we had the data, updating this plot is easy because that's FOSS. It's just that generating that data requires an investment of a substantial amount of time and energy -- we need to have a few people label new editors as good-faith or otherwise. --Halfak (WMF) (talk) 18:05, 21 February 2016 (UTC)
- How much data you need for updating? Was it really produced merely by manual classifying?--Kopiersperre (talk) 19:54, 21 February 2016 (UTC)
- We worked with 100 observations (newcomers who edited at least one article) per 6 month period. So, 200 per year or ~1000 for the 5 years since the plot above. Also, I'm not sure what you are talking about re "merely manually classifying". It's hard work. --Halfak (WMF) (talk) 13:55, 22 February 2016 (UTC)
- I am able to work hard. Just give me some 100 users and I will try to classify them.--Kopiersperre (talk) 21:34, 23 February 2016 (UTC)
- hey Kopiersperre. So, to do this, I need to (1) set up Wiki labels to show an edit-collection view (for the newcomer's first session of editing), (2) gather a random sample of newcomers from the time periods we'd like to looks at and (3) get to work doing the labeling. This will take a while before I can find time for it. But I'll ping you when I work that out. --Halfak (WMF) (talk) 23:42, 26 February 2016 (UTC)
- I am able to work hard. Just give me some 100 users and I will try to classify them.--Kopiersperre (talk) 21:34, 23 February 2016 (UTC)
- We worked with 100 observations (newcomers who edited at least one article) per 6 month period. So, 200 per year or ~1000 for the 5 years since the plot above. Also, I'm not sure what you are talking about re "merely manually classifying". It's hard work. --Halfak (WMF) (talk) 13:55, 22 February 2016 (UTC)
- How much data you need for updating? Was it really produced merely by manual classifying?--Kopiersperre (talk) 19:54, 21 February 2016 (UTC)
- Kopiersperre, oh. I use en:ggplot2. If we had the data, updating this plot is easy because that's FOSS. It's just that generating that data requires an investment of a substantial amount of time and energy -- we need to have a few people label new editors as good-faith or otherwise. --Halfak (WMF) (talk) 18:05, 21 February 2016 (UTC)
- Of course. I use gnuplot for my plots on Commons. This enables possibly everyone to update from new csv data.--Kopiersperre (talk) 10:51, 17 February 2016 (UTC)
I have been using this script already, but I felt it was slowing down my Wikipedia experience too much. I'm sorry, but I have quite slow internet. Is there any script-free solution? I would even do this with manually storing the results in a Excel sheet.--Kopiersperre (talk) 17:08, 27 February 2016 (UTC)
- Kopiersperre The Wiki labels script was slowing down your connection? That's strange and probably a bug. Can you tell me more about what you were experiencing? --Halfak (WMF) (talk) 19:31, 1 March 2016 (UTC)
- I don't know which of my scripts is slowing down, but one is the culprit.--Kopiersperre (talk) 11:52, 2 March 2016 (UTC)
VE data
[edit]I've been thinking it would be helpful to have more hard data to either confirm the value of VE, or confirm VE's failure to produce the intended benefits. (i.e. the May 2015 VE study showing an absence of the intended benefits.)
Is there any data on whether new accounts embrace or abandon VE over time? And if not, would it be easy and worthwhile to extract that information? To clarify my general intent, I'm picturing accounts created after VE was provided as a second edit tab. (Either all such accounts, or a sample group with significant age.) I'm picturing something like a graph where the vertical axis is % of edits using VE, and the horizontal axis would be the Nth edit made by users. For users who do make a 50th edit, what % of those edits use VE? For users who do make a 200th edit, what % of those edits use VE? The data would get sparse at higher edit counts, but that's fine because we're looking at percentage. Someone's first edit may be roughly a 50% 50% random split for which editor they experiment with first, but over time do they settling into heavy usage of VE because they find it easier and more effective? Or do they abandon VE because wikitext editing is easier and more effective? Alsee (talk) 02:01, 14 April 2016 (UTC)
- Ping: User:NQuinn (WMF). I figure you might have an answer. --Halfak (WMF) (talk) 20:22, 14 April 2016 (UTC)
- I would also be very interested to see the results of this research. I'm now making increasing use of the VisualEditor as it becomes more useful, but it's still been a steep learning curve for making all but the very simplest edits, as it still has an occasional tendency to do nasty Microsoft-Word-like things when you hit edge cases. -- The Anome (talk) 12:36, 3 August 2016 (UTC)
- I'm still looking for this. It's not good when conflicts arise due to different people having contradicting assumptions about reality. If I'm wrong I want to know it, and if I'm right I want data to back me up. Alsee (talk) 08:41, 4 August 2016 (UTC)
- @Alsee and The Anome: I'm afraid that I don't have the data at hand and that it would not be easy to generate: for example, one of several issues I can foresee is that the MediaWiki databases have no concept of an edit being a user's Nth. But it is an interesting question, so I've put the task on my team's backlog, and if we have a chance to work on it in the future, we certainly will. In the meantime, if you know of anyone who'd like to attempt this research on their own (it should be possible, though difficult, using Quarry), I'd be happy to advise them.—Neil P. Quinn-WMF (talk) 18:57, 8 August 2016 (UTC)
- Thank you. This sort of data would also be useful to have for other sorts of A/B testing. -- The Anome (talk) 23:35, 8 August 2016 (UTC)
Thinking of you
[edit]Hey there. Sherry and I wanted to award you this barnstar for "being awesome". Sorry the rationale isn't much detailed: work logs are also not available :) Rock on, --Elitre (WMF) (talk) 14:25, 27 May 2016 (UTC)
- <3 Thank you! I needed this right now. --Halfak (WMF) (talk) 15:04, 27 May 2016 (UTC)
ORCID
[edit]You may wish to add your ORCID iD to your user page (as I have done on mine), using Template:User ORCID. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:18, 16 August 2016 (UTC)
- Pigsonthewing, thanks Done. --Halfak (WMF) (talk) 18:53, 16 August 2016 (UTC)