Jump to content

Research talk:Data introduction

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 1 day ago by TBurmeister (WMF) in topic Missing data sources?

Missing data sources?[edit]

Are there data sources that are linked on Research:Data that have not been included on this new page, but which you think should be added? My goal was to include the most continuously updated and/or authoritative sources in each data domain, but I also don't want to leave out unique or interesting data sources. TBurmeister (WMF) (talk) 22:25, 17 April 2024 (UTC)Reply

Sorry, but the onus is on you to explain specifically which information you want to delete from Research:Data, and why. The current version of that page is the result of efforts many different people over more than a decade. Granted, this doesn't meant that you have to agree with their decisions or that you are prohibited from removing their work. But doing so without providing any rationale might be considered disrespectful, and (civility issues aside) is also not likely to be a process that is optimal for determining what content is most useful to readers of that page. (It appears you are a Wikipedian yourself, so perhaps it helps to review w:WP:PRESERVE or recognize the fact that a "Blow it up and start over" approach is usually reserved for severely deficient articles there.) And while there could be parts that are genuinely outdated, that seems a bit less likely considering that your colleague Andre already did an extensive rewrite less than two years ago (and, by the way, did a much better job at explaining specific removals and changes using edit summaries).
Just to illustrate that this is not a theoretical concern, or that I'm not raising it merely out of principle: It appears that you plan to delete all pointers to third-party datasets (and dataset search engines), i.e. those not provided by the Wikimedia Foundation. I disagree with that change and will revert it if needed. There are many such datasets that are in fact of great interest to researchers (I just happened to cover one in the new issue of the research newsletter). More generally, I would ask you to be mindful of the fact that this is a community wiki and that the purpose of a page such as Research:Data should be to document what is useful for researchers to know about this topic - rather than, say, "What are the Wikimedia Foundation's opinions and offerings related to this topic".
And besides removals of particular datasets and links, this also concerns textual content. Again, just to illustrate the general point, one small example: A while ago I added the clickstream dataset to R:Data, with a brief description of what it actually consists of (excerpted from the linked full documentation page, akin to w:WP:SUMMARY), because I think that the name will not be self-explanatory to many readers. Evidently you disagree, considering that you deleted that description in your version. Fine, we can discuss that, but you should transparently flag such removals and provide your rationale, instead of putting the onus on others to find them and ask you about your reasons.
Regards, HaeB (talk) 10:49, 1 May 2024 (UTC)Reply
Thank you for taking the time to leave a comment. Please be assured that I have no desire to disrespect or invalidate the history of contributions that have resulted in the current Research:Data page. As a technical writer working on this content, my main goal is the same as what is stated at the top of the current page: "to help community members, developers, and researchers who are interested in analyzing raw data learn what data and infrastructure is available." Your comment helped me to notice ways in which I can evolve my technical documentation editing/writing process to align more with wiki principles, even when I'm not editing Wikipedia articles. So, thank you for that.
In the time since you left your comment, I have:
For your specific comment about the clickstream dataset: my rationale was that "clickstream" is a standard / recognizable term in the field of web traffic analysis, so most researchers who are the audience of this page wouldn't need that context with the link itself. If they do, that definition is explained quickly and prominently on the linked page. That said, I'm not rigidly opposed to including some contextual text with dataset links, as long as it is minimal -- especially if the links are in a table, keeping the text short is important for display/readability.
I think that since Research:Data is primarily organized by data access method or platform, it would be valuable to include a section that organizes the information by data domain. As a data consumer, I may be more likely to know what kind of data I'm interested in, or what my field of research is, before I know what type of data access method I want to use. Instead of replacing Research:Data, I am going to:
  • Add the "Data domains" section of this page to the Research:Data page as a supplemental section, to compliment the information already there that is organized by access method.
  • Leave the rest of this page (the longer, conceptual overview section) here as a separate page, since it serves a somewhat different purpose, but I will add a link to it from Research:Data.
I hope those changes will be beneficial while also being easier to iterate on as an evolution of the Research:Data page. TBurmeister (WMF) (talk) 17:59, 27 June 2024 (UTC)Reply