Research talk:Predict Users Search
Add topicAppearance
Latest comment: 10 years ago by Okeyes (WMF) in topic We don't have that data
We don't have that data
[edit]What it says on the tin. Or, more accurately; our search logs, intentionally, do not contain anything that can tie each search request to the previous request, or to other browsing activity; no PII (outside of the search string itself), no unique IDs. Okeyes (WMF) (talk) 00:40, 4 November 2014 (UTC)
- Responding to the rest of the request:
- "We would need data related to pages users visited (in English), when they visited those pages and if possible if it was through direct link within Wikipedia."
- This, we have, although I don't think it's something you're likely to find researcher time to extract. The R&D team at Wikimedia is ~5 people, all of whom have a lot of responsibilities already.
- "Also, data about the users (gender, age, country...) might relevant be we are not sure at this point."
- I..don't know how we'd get the gender and age of someone from their IP address and user agent, which is the only PII we gather from requests.
- "We would need data related to pages users visited (in English), when they visited those pages and if possible if it was through direct link within Wikipedia."
- So, we don't have most of this data. And the bit we do have is, I'm afraid, a pain to extract; it's unlikely to be worth it, at our end, for an in-class project :(. Okeyes (WMF) (talk) 00:42, 4 November 2014 (UTC)