Research talk:Newcomer task suggestions/Work log/2014-08-01
Add topicFriday, August 1st
[edit]Today, I'm doing some stats on our hand codings. First, we'll look at the overlap.
> kappam.fleiss(overlap[,list(title.aaron>0, title.maryana>0, title.steven>0)]) Fleiss' Kappa for m Raters Subjects = 54 Raters = 3 Kappa = 0.257 z = 3.28 p-value = 0.00105 > kappam.fleiss(overlap[,list(text.aaron>0, text.maryana>0, text.steven>0)]) Fleiss' Kappa for m Raters Subjects = 54 Raters = 3 Kappa = 0.156 z = 1.98 p-value = 0.0477
Well. These results put us solidly in the "slight agreement" category which is not good enough for combining observations. Usually we're looking for a Kappa that's above 0.6.
So, let's just not combine the hand-codings and instead look at what each hand-coder thought.
I think that the TL;DR of this analysis is, we can recommend up to 50 titles if we want, but 15 is a fine cutoff. Once we look at the text, we agreed that at least 75% of recommendations are relevant all the way out to rank 50. That means, if we list out 3 on the page, we have a 1.6% chance that none of them will actually be similar.
For the first 3 recommendations, 94.% were similar. That means we have a 0.016% chance that none of the three top recommended articles will be similar. Cool! --Halfak (WMF) (talk) 19:14, 1 August 2014 (UTC)