Research talk:Geo-aggregation of Wikipedia edits
Add topicAppearance
Latest comment: 9 years ago by Shilad in topic Notes from Dario
Notes from Oliver
[edit]- I really like the premise! You know how I feel about geolocation :D.
- I really like the premise! It seems similar to Priedhorsky's approach but far less computationally intensive.
- I'm concerned about the opt-out/opt-in; that sounds like engineering work :/. I'd absolutely love for us to have an opt-in/out system, though, so I'm not sure where that puts things. At the moment, engineering is...overspent in time, so that's going to need a lot of resourcing outside the direct purview of the Research & Data or Strategic Research teams. I'd note that (on the specific options available) we don't currently note on the editor-side if a user has DNT headers. Ironholds (talk) 18:13, 27 March 2015 (UTC)
- Thanks for the feedback, User:Ironholds! I'm glad you feel the sketch of the anonymization algorithm is a reasonable starting point. Regarding opt-in/out: I totally sympathize. Would it be better to remove those issues from the proposal for now? Shilad (talk) 20:54, 27 March 2015 (UTC)
Notes from Dave
[edit]This is a good proposal: I'd love to see this happen.
Notes from Dario
[edit]Thanks for getting this started, Shilad. A few notes:
- I second Oliver's concerns about the engineering costs of implementing an ad hoc opt-out mechanism. Other hacky solutions Wikipedians have come up with in the past (like setting up a list of usernames on a wiki page) are hard to use and enforce. The good news is that the Analytics Engineering and UX teams are working on the design of opt-in / opt-out tools for traffic data, we'll have to either fold this proposal under the same project or figure out with the Foundation's Legal team how to handle the privacy concerns of geotagging edits otherwise (we have a lot of existing context about this in the Legal team). I don't think this proposal needs to be subject to an opt-in as the privacy policy allows WMF to collect IP addresses and the associated information as private data, as long as their publication is subject to adequate aggregation and retention.
- there's no explicit mention in the proposal (other than an example in the data format section) of the desired temporal granularity of the dataset. What's the minimum time range that would produce usable data for this research?
- I wanted to clarify that there's no expectation to produce data with a geographic component aggregated at sub-project level.
- finally, could you add a few thoughts on the extension of the proposal to all articles, not just geotagged articles.
--Dario (WMF) (talk) 20:04, 10 April 2015 (UTC)
Thanks for the excellent feedback, User:Dario (WMF)! I've tried to address your points in the latest revision. Please let me know if you have any more questions or feedback! Shilad (talk) 11:04, 20 April 2015 (UTC)