Research talk:Reading time/Draft Report

TODOs

Latest comment: 6 years ago6 comments2 people in discussion

[ ] Move a key plot into the intro (especially global south/global north and last in session)

[ ] Bring over entire background section - Olga can revise afterwards

[ ] Bring over motivation for hypotheses

[ ] Bring over discussion of geographic terminology

[ ] Make subpages for the parts that go into the online supplement and transclude them (Nate)

[ ] Reorganize to present the results first

[ ] Olga to go back and look at the metrics insights presentation

[ ] Bring up a key diagram into the introduction

[ ] Make sure the interpretation of mobile vs desktop and development is consistent.

[ ] Make the math a bit smaller.

[ ] Make the big plot look good in smaller browser windows.

[X] Revise preamble into the abstract and intro.

[X] Better curate or organize the bullets in the abstract.

[X] Move bonus results into normal results.

[X] Condense the discussion of exponentiated weibull interpretation to consider only the relevant conditions.

[ ] Olga: Why did we choose these language editions in the box plots?

[ ] Connect result on page length to prior work showing that most people don't expand pages on mobile.

[X] Fix order of rows in regression table

[ ] Wikipedia 0 is a possible confounder, but it was never a very large amount of traffic. This doesn't mean that data costs do not account for some of the difference between GN and GS.

@Groceryheist: True, it was less than 1% during this timeframe (up to 2% back in 2016). I guess you meant to write some thing like "This doesn't mean that data costs do not account for some of the difference between GN and GS"? Regards, Tbayer (WMF) (talk) 20:11, 20 March 2019 (UTC)Reply

Tbayer (WMF) Yeah that's right. Groceryheist (talk) 18:37, 21 March 2019 (UTC)Reply

[X] Add units to marginal effects plots

[X] Do a pass for proofreading and formatting

[X] Make a marginal effects plot for H1

[X] Explain what a marginal effects plot is

[X] Introduce and interpret plots

[X] Label figures

[X] Conclusions

[X] Interpret plots for bonus findings

[X] Add summary statistics for reading time to data section.

[X] Add explanation for estimating the total reading time.

[ ] Make sure that we refer to GoF measures consistently.

[ ] What are the differences between extreme values on mobile and desktop?

[ ] References to background literature.

[X] Future directions in the conclusions.

[ ] Reach out to Leila over email or IRC about the findings.

[ ] RE Mobile limitations how often total length and visible length are different.

~~:[ ] P2 Add t-tests for comparing means to the univariate analysis.~~

[X] Expand on justifying that KS test is a relatively high bar with a sample of 20,000

[ ] P2 consider testing whether this analysis is robust to missing data on mobile.

[ ] What could these findings mean for funding programs?

[X] Organizing framing: background -> list of questions -> focus on countries

[X] Write an abstract for a popular audience

[X] Call mean(log(x)) the geometric mean.

[X] Format plots as centered with 2 columns

[X] Answer hypothesis with Beta and SEs

[X] Add multivariate analysis and hypothesis tests.

[X] Add precise dates for the collection and analysis periods

[X] Tilman, Can you explain how the Reading Depth Schema was merged with the geocoded data and so on?

[X] Tilman, Can you check that my description of first paint and dom interactive time is accurate?

[X] Point out that we remove (impossible) negative values.

[X] Rationale for using mean(log(x)) as the metric, based on lognormal distributions, compared to just the mean.

[X] Mention Zareen who worked on this earlier this year.

[X] How did we identify mobile users?

[X] Improve the plots in the univariate analysis.

[X] what are our sample sizes?

[X] Be consistent with using units or percentages in the abstract, hold more context about reference level numbers.

For OpenSym

[ ] What's the intuition for last in session? Is there some prior work for that?

[ ] Sharpen the framing to focus on more compelling, general ideas beyond content consumption.

[ ] Start with a new outline for the front end to motivate hypotheses in terms of prior literature and theories. What is this good for?

[ ] Look for the paper (by a student of Caesar Hildago) on biographies across languages

[ ] Summarize known relevant differences between Global North / Global South

[ ] Frame more around hypothesis tests.

[ ] Find a better term than "depth" to describe information seeking tasks.

[X] Remove values from the data that are theoretically impossible.

Groceryheist (talk) 00:27, 2 January 2019 (UTC)Reply

[ ] Create table of frequency of mobile and desktop and session lengths between global north and global south.

Things that we may never get to

[ ] Statistical control for time of day.

[ ] Fit multilevel models

Should we use the variable names in the schema or should we use readable names?

In an academic article I would use more descriptive and readable variable names instead of technical variable names. If the audience for this report is more technical, and likely to include people who might work with this data then it might be best to be consistent with the schema. What do others think? Groceryheist (talk) 07:15, 30 November 2018 (UTC)Reply

We should strive to make the report accessible for a less technical audience, at least regarding the main takeaways. So +1 to using more descriptive names, as long as we note at some point what field name it corresponds to in the schema / the documented calculations. Regards, Tbayer (WMF) (talk) 21:09, 30 November 2018 (UTC)Reply

Notes on the charts

File:Weibull_dist_enwiki.png:
- Shouldn't this use visibleLength instead of totalLength?
- (same for some other charts:) It would be preferable to use milliseconds instead of seconds as unit (and annotate the axis accordingly), like in File:WP_page_visible_time_histograms_by_select_wikis.png
File:WP_page_visible_time_histograms_by_select_wikis.png: clean up (or explain) the black character salad on the right ;)
File:Wikipedia_reading_dwell_time_analysis_---_Marginal_effects_of_page_length.png:
- Use non-log (i.e. human readable) units for revision text size

Regards, Tbayer (WMF) (talk) 21:19, 30 November 2018 (UTC)Reply