Research talk:Onboarding new Wikipedians/OB6/Work log/2013-11-27
Add topicWednesday, November 27th - Survival model
[edit]I hacked together a survival model based on the number of sessions completed. I considered a user "death" to occur when they don't come back to edit for at least 1 week. The first model predicts the hazard by test condition alone:
Call: coxph(formula = Surv(sessions, !censored) ~ bucket, data = user.metas) n= 26920, number of events= 26657 coef exp(coef) se(coef) z Pr(>|z|) buckettest -0.009573 0.990473 0.012250 -0.781 0.435 exp(coef) exp(-coef) lower .95 upper .95 buckettest 0.9905 1.01 0.967 1.015 Concordance= 0.503 (se = 0.005 ) Rsquare= 0 (max possible= 1 ) Likelihood ratio test= 0.61 on 1 df, p=0.4345 Wald test = 0.61 on 1 df, p=0.4345 Score (logrank) test = 0.61 on 1 df, p=0.4345
Basically, what this is saying is that being in the test condition slightly lowers hazard (of leaving), but the confidence that this effect is really bad (p=0.43).
Next I tried to control for initial investment by including the amount of time spent editing in the first session as a predictor. This is a good way to explain some of the noise around less important predictors that I've used in the past (see R:First edit session).
Call: coxph(formula = Surv(sessions, !censored) ~ bucket + first_session_duration, data = user.metas[sessions > 0, ]) n= 8999, number of events= 8736 coef exp(coef) se(coef) z Pr(>|z|) buckettest -0.01580 0.98433 0.02142 -0.737 0.461 first_session_duration -0.96998 0.37909 0.02052 -47.272 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 buckettest 0.9843 1.016 0.9438 1.0265 first_session_duration 0.3791 2.638 0.3641 0.3946 Concordance= 0.826 (se = 0.011 ) Rsquare= 0.377 (max possible= 1 ) Likelihood ratio test= 4265 on 2 df, p=0 Wald test = 2235 on 2 df, p=0 Score (logrank) test = 1438 on 2 df, p=0
Note that the first session duration was highly significant and predicts a massive amount about survival between sessions. The R^2 of this model jumped from effectively zero to 0.377. Sadly, the test condition still fails to find significance at p=0.461.