Talk:Pageviews Analysis/Archives/2022/4
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
FAQ references dead link, Go Daddy parked Grok.se
- pageviews.wmcloud.org/pageviews/faq
- Why can't I view data older than July 2015
i.e.: stats.grok.se
See parked reference: http://www.grok.se
However, stats.grok.se refers to www.wikishark.com/about.php allegedly by Elad Vardi, Data Science PhD
- scholar.google.com/citations?user=B7lyIvoAAAAJ&hl=en
If reliable, maybe update the FAQ WurmWoode (talk) 12:48, 12 October 2022 (UTC)
Pageviews bug report
In mozilla, it does not work with a single page, "Chilina parchappii" in es.wiki. https://pageviews.wmcloud.org/?project=es.wikipedia.org&platform=all-access&agent=all-agents&redirects=1&start=2022-09-01&end=2022-10-01&pages=Chilina_parchappii
- That's a bug that needs to be fixed, but basically what this means is zero pageviews. Expand the date range and you'll see there were some pageviews [1] MusikAnimal (WMF) (talk) 21:12, 9 November 2022 (UTC)
Artificial intelligence monthly pageviews from dashboard do not match with the data dump
Hello,
I had a look on Artificial Intelligence page count on the dashboard Pageviews Analysis for February 2022 and it shows that there are 386,514 pageviews. Moreover, to play with the data I have downloaded the dump for the given period. After grouping by project, title and summing the pageview values I have compared them with the dashboard values. All the values are same except for Artificial Intelligence page. Therefore, could you please guide me what could be the possible issue? The data I have downloaded contains only user's search and it is the case with the dashboard (without redirecting pages and for all platforms). Thank you and looking forward to your response.
Browser used: Microsoft Edge Version 107.0.1418.26 (Official build) (64-bit)
Operating system: Windows 11 2A04:CEC0:11B5:6A72:19BB:94FD:FAD4:A1A 20:11, 7 November 2022 (UTC)
- I am not sure. The Data Engineering team is probably the best to answer this, as they maintain both the API used by Pageviews Analysis and the dumps. See wikitech:Data Engineering/Contact for contact details. MusikAnimal (WMF) (talk) 21:15, 9 November 2022 (UTC)
Timeouts, Excel-friendliness, no-sort option?
Great work guys, having been using Massviews a bit I'm liking it a lot, but I do have a few niggles/feedback... I'm looking at pageviews across a single WikiProject, which can be of the order of 30-80k pages, and where there's usually no convenient categories by which to break it down (eg if you do by importance then Low is usually over 20k) so I've been uploading 20k wikilinked article names at a time to my en.wiki sandbox and pointing Massviews at that to get the last year's pageviews. However, it generally fails for about 20% of them, which I assume is because it's running into a 10-minute timeout. Is there any chance this could be increased? Or alternatively, if it reduced server load I'd be OK with taking a lower resolution over that timescale, I don't need daily data. For my purposes 80,000 articles at 12 x monthly resolution would be much more useful than 20,000 articles at 365 x daily resolution - any chance of that happening?
Another problem I have is that I often find myself mashing up massviews data with other things in Excel, but it's one of the older versions of Excel that doesn't cope well with Unicode in CSVs, and some of my projects have a lot of non-ASCII characters in (eg a lot of en:Category:Mountains and hills of the Central Highlands or Viking names like en:Guðrøðr Rǫgnvaldsson). So I end up having to copy and paste the names back in. Would it be possible to have a download format that was more old-Excel friendly? One option would be the BOM header hack - add chr(0xEF)chr(0xBB)chr(0xBF) as the first three characters of the CSV, it could be presented as an extra option for "Excel CSV" perhaps? Or alternatively output in .xls(x) or .ods or whatever.
On a related note, it would make it a lot easier to fix these things if Massviews didn't insist on sorting these big datasets by pageviews. Can there be an option on the front page to just not sort? I see there's sort parameters in the URL but they don't seem to work? Cheers. FlagSteward (talk) 19:56, 11 November 2022 (UTC)
- @FlagSteward: I think, Pageviews_Analysis is not for mass data. That want have the another backend acces. Do you want the all views or only human views. The human views for all wikis of year 2021 are the mass statistic here.
- Maybe You don't use the old Excel, but You use the actual LibreOffice (opensource). Dušan Kreheľ (talk) 20:36, 11 November 2022 (UTC)