Research talk:Newcomer task suggestions/Work log/2014-08-01

Friday, August 1st

Latest comment: 10 years ago1 comment1 person in discussion

Today, I'm doing some stats on our hand codings. First, we'll look at the overlap.

> kappam.fleiss(overlap[,list(title.aaron>0, title.maryana>0, title.steven>0)])
 Fleiss' Kappa for m Raters

 Subjects = 54 
   Raters = 3 
    Kappa = 0.257 

        z = 3.28 
  p-value = 0.00105 
> kappam.fleiss(overlap[,list(text.aaron>0, text.maryana>0, text.steven>0)])
 Fleiss' Kappa for m Raters

 Subjects = 54 
   Raters = 3 
    Kappa = 0.156 

        z = 1.98 
  p-value = 0.0477

Well. These results put us solidly in the "slight agreement" category which is not good enough for combining observations. Usually we're looking for a Kappa that's above 0.6.

So, let's just not combine the hand-codings and instead look at what each hand-coder thought.

Similarity of recommended article by rank. Proportions of hand-coded similar titles are plotted by rank buckets (bucket size = 5) for each coder.

I think that the TL;DR of this analysis is, we can recommend up to 50 titles if we want, but 15 is a fine cutoff. Once we look at the text, we agreed that at least 75% of recommendations are relevant all the way out to rank 50. That means, if we list out 3 on the page, we have a 1.6% chance that none of them will actually be similar.

For the first 3 recommendations, 94.% were similar. That means we have a 0.016% chance that none of the three top recommended articles will be similar. Cool! --Halfak (WMF) (talk) 19:14, 1 August 2014 (UTC)Reply