Jump to content

Research talk:Reading time/Work log/2018-11-06

Add topic
From Meta, a Wikimedia project coordination wiki

Tuesday, November 6, 2018[edit]

Model 1 Results[edit]

After some misadventures fitting these models with spark (which was tantalizingly close to working, but which inexplicably denied me pvalues on several occasions. (TODO: Ask a stackoverflow question about this, since I have not yet received a response to my comment on a similar question.

We do not have enough memory to fit the whole specification on the notebooks machines (this is with a stratified sample of 200 observations in each strata). There are just too many variables. However, we can fit a model for each wiki no problem. This lets us use the secret weapon, which means comparing separate models for each wiki. Last night I fit these models and this morning I made a plot of the estimates for a selection of wikis.


This chart shows model estimates with confidence intervals for regression models predicting the time a page was visible in the browser. Each model is fit on a stratified sample of data from a different wiki.