Research talk:Reading time/Work log/2018-11-02
Add topicSaturday, November 3, 2018
[edit]Page unloaded event differences by mobile?
[edit]As discussed in the meeting with Jon, One possible limitation of the data may threaten our ability to make a fair comparison between mobile readers and desktop readers. Will mobile browsers fire pageUnloaded events when readers close or switch apps or not? If they do not then we will have a large number of page loaded events without page unloaded events and we will have missing data in a way that will be correlated with mobile usage. Even if mobile browsers do fire page unloaded events, they may do so in situations when we might expect the visiblelength counter to be updated. This would lead to downward bias in mobile reading times.
I wrote this query to compare the frequency of discrepant pageloaded and pageunloaded events.
SELECT COUNT(DISTINCT pagetoken) AS NReaders, Mobile, SUM(IF(one_each,1,0)) AS n_one_each, SUM(IF(not_unloaded,1,0)) AS n_not_unloaded, SUM(IF(loaded_more_than_1x,1,0)) AS n_loaded_more_than_1x, SUM(IF(unloaded_more_than_1x,1,0)) AS n_unloaded_more_than_1x
FROM
( SELECT pagetoken, Mobile, (SUM(Nloaded) == 1) AND (SUM(Nunloaded) == 1) AS one_each, (SUM(Nloaded) == 1) AND (SUM(Nunloaded) == 0) AS not_unloaded, SUM(Nloaded) > 1 AS loaded_more_than_1x, SUM(Nunloaded) > 1 AS unloaded_more_than_1x
FROM
( SELECT pagetoken,
action,
Mobile,
COUNT(*) AS N,
SUM(IF(action=="pageLoaded", 1, 0)) AS Nloaded,
SUM(IF(action=="pageUnloaded", 1, 0)) AS Nunloaded
FROM ( SELECT event.pagetoken AS pagetoken, event.action AS action, webhost LIKE "%.m.%" AS Mobile FROM nathante.cleanReadingData WHERE event.namespaceid == 0) g
GROUP BY pagetoken, action, Mobile
) h
GROUP BY pagetoken, action, Mobile
) i
GROUP BY Mobile
dt = as_pandas(hive_cursor)
dt['p_one_each'] = dt['n_one_each'] / dt['nreaders']
dt['p_not_unloaded'] = dt['n_not_unloaded'] / dt['nreaders']
dt['p_unloaded_more_than_1x'] = dt['n_unloaded_more_than_1x'] / dt['nreaders']
dt = dt.drop('n_loaded_more_than_1x',1)
nreaders | mobile | n_one_each | n_not_unloaded | n_unloaded_more_than_1x | p_one_each | p_not_unloaded | p_unloaded_more_than_1x | |
---|---|---|---|---|---|---|---|---|
0 | 6 | None | 3 | 2 | 1 | 0.500000 | 0.333333 | 0.166667 |
1 | 448722947 | False | 424857529 | 21928737 | 1935907 | 0.946815 | 0.048869 | 0.004314 |
2 | 940421259 | True | 400006656 | 535123953 | 5288834 | 0.425348 | 0.569026 | 0.005624 |
As suspected, the incidence of page loaded events without page unloaded events is high on mobile. About 57%!
@Jon (WMF): --- FYI
Nevertheless, I am fitting the models that we talked about earlier. — The preceding unsigned comment was added by Groceryheist (talk) 03:03, 3 November 2018 (UTC)
- Thanks for looking into this and quantifying these concerns!
- Also CCing Timo Tijhof, with whom I had a chat about this recently in Portland. He mentioned that Google has proposed a new browser feature that addresses such issues, the "Page Lifecycle API". Actually, from https://developers.google.com/web/updates/2018/07/page-lifecycle-api it seems that this is already live in the most recent versions of Chrome? Jon, would this be something we could try using in the ReadingDepth schema? Regards, Tbayer (WMF) (talk) 02:39, 4 November 2018 (UTC)