Talk:Pageviews Analysis/Archives/2022/2

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Not working

Latest comment: 2 years ago9 comments7 people in discussion

The pageviews analysis doesn't seem to be working at the moment example. Everything is out of place and there's no graph. I'm looking at it on Firefox. It was ok a few days ago. G-13114 (talk) 22:26, 26 February 2022 (UTC)

I confirm that problem. Jacek555 (talk) 11:50, 27 February 2022 (UTC)
Ditto, with Firefox 97.0. Working fine in Chrome 68.0.3440.106.

Cross-Origin-aanvraag geblokkeerd: de Same Origin Policy staat het lezen van de externe bron op https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia/all-access/2022/01/all-days niet toe. (Reden: CORS-aanvraag is niet gelukt). Statuscode: (null).

Uncaught TypeError: i.responseJSON is undefined     application-6f4672523f.js:12:29312
    o https://pageviews.wmcloud.org/topviews/application-6f4672523f.js:12
    u https://pageviews.wmcloud.org/topviews/application-6f4672523f.js:2
    fireWith https://pageviews.wmcloud.org/topviews/application-6f4672523f.js:2
    i https://pageviews.wmcloud.org/topviews/application-6f4672523f.js:3
    t https://pageviews.wmcloud.org/topviews/application-6f4672523f.js:3

- Erik Baas (talk) 12:02, 27 February 2022 (UTC)

@G-13114 @Jacek555 @Erik Baas Are you using a privacy browser plugin such as PrivacyBadger or uBlock? You will need to disable it for wmcloud.org. The fix to the ad blockers has been merged and will soon make its way to your plugins, at which time you can remove wmcloud.org from your allowlist if you want. MusikAnimal (WMF) (talk) 19:04, 27 February 2022 (UTC)

I think its better to make this sevice available to all without hassles and technical details. There are likely many users who use privacy browsers. JMK (talk) 06:23, 28 February 2022 (UTC)

If something like the fix above was applied (and propagated) before the actual URL change, no user-facing issue would have happened. Was it simply overlooked? While not strictly required (and individual users are ultimately responsible for the choice of plugins and browsers they use), it would have been nice if it was done. whym (talk) 11:56, 28 February 2022 (UTC)

It seems to be working properly again now. G-13114 (talk) 17:05, 28 February 2022 (UTC)

individual users are ultimately responsible for the choice of plugins and browsers they use – precisely. This problem isn't unique to Pageviews Analysis. Privacy plugins break lots of web pages and the burden falls on the user to realize when it needs to be disabled. But to answer your question, yes, the update to EasyList (the allowlist used by these plugins) was overlooked. That was added years ago by someone else and it didn't occur to me to update it. My apologies! There's supposed to be a message shown when a privacy plugin prevents loading of Pageviews Analysis, and that apparently has broken. I'll try to get that fixed. MusikAnimal (WMF) (talk) 17:47, 28 February 2022 (UTC)

I do not use any privacy browser plugin but massview does not work on my phone for half a year.--Maxaxax (talk) 20:34, 23 April 2022 (UTC)

Pageview broken in Safari

Latest comment: 2 years ago3 comments2 people in discussion

Since last week it seems that Pageview sis broken on Safari. It is effectively a white page with some language links at the side and a selection panel at the end for different statistical graphics. Deleting Cache does not help.--Maphry (talk) 07:35, 30 March 2022 (UTC)

I have the same issue. Safari won't load JS and CSS. For example, when accessing https://pageviews.wmcloud.org/?pages=Ammonia&project=en.wikipedia.org, Safari's Web Inspector displays the following errors:

[Info] Content blocker prevented frame displaying https://pageviews.wmcloud.org/?pages=Ammonia&project=en.wikipedia.org from loading a resource from https://pageviews.wmcloud.org/pageviews/application-f3e320f008.js
[Info] Content blocker prevented frame displaying https://pageviews.wmcloud.org/?pages=Ammonia&project=en.wikipedia.org from loading a resource from https://pageviews.wmcloud.org/ad_block_test.js
[Info] Content blocker prevented frame displaying https://pageviews.wmcloud.org/?pages=Ammonia&project=en.wikipedia.org from loading a resource from https://pageviews.wmcloud.org/pageviews/application-a01dec23b5.css

Please note that other tools, for example, https://xtools.wmflabs.org/articleinfo/en.wikipedia.org/Ammonia, display fine (JS and CSS are loaded), so it's not an issue with Safari – it's an issue with Pageviews Analysis. — UnladenSwallow (talk) 06:04, 22 May 2022 (UTC)

I'm using AdGuard for Safari. Disabling it fixed the issue. However, I would prefer that Pageviews Analysis worked properly with AdGuard for Safari enabled, like other Wikipedia page analysis tools. — UnladenSwallow (talk) 06:27, 22 May 2022 (UTC)

Rate limit issue

Latest comment: 2 years ago1 comment1 person in discussion

When I put in wikt:Category:Hebrew lemmas, the tool started working okay but broke at around the 5000th page, and then spat out "Error querying Pageviews API - Unknown" for every single item thereafter. This seems to be a rate limit issue. Could you change it to go slower, or make the rate limit more generous, so as to avoid this? I also tested it on other large categories of the English Wiktionary and this always happened. Thank you. 70.172.194.25 05:21, 15 April 2022 (UTC)

Handling categories over 20,000 articles

Latest comment: 2 years ago5 comments2 people in discussion

First off - having been away for nearly a decade, great job all! But it does mean that I'm having to convert some old code that used to painstakingly get pageviews article by article from grok.se. Massviews would appear to be perfect for my purpose, except I work on assessments with some of the bigger en:WikiProjects such as WP Italy. You'll see from their assessments table has 81k articles in total with both Low and Stub over 20k, and WP Scotland has just their Low-importance category over 20k. Now I don't care too much how I get pageviews for all of those (and just a one-off way to get the Scottish Lows would allow me to get on with a load of stuff whilst longer-term solutions were found), but that's what I'm after - actually twice over, as I mostly work on 365-day totals but grab a 14-day sample for comparison to reality-check that there's no issues with new pages, redirects converting into articles and so on. I don't mind doing it "manually" by importance category as there's only 6 of them, but Low importance would need to be split in 3 as it's 42k. Still that's doable, whereas doing 20-odd quality classes (and stubs would still need splitting) is getting into the realms of wanting a script. There are some other projects where there's an importance category that's over 20k but no quality categories are, so a scriptable way to do by quality would keep me busy in the short term.
However - I don't see a way to get article 20,001, I don't see an equivalent of cmcontinue on the main API which allows you to keep track of where you are in a big category. Or have I just not seen it? I've also tried a TOC-type URL using an urlencoded ?from= parameter but that didn't work either. Is there another way to do this via pagepiles or Quarry - but I assume they have the same 20k limit? I see there was a reference back in 2017 to doing something about the 20k article limit, but assume there's been no progress on that.
So what's my best way forward? To sum up :

Ideally I'm looking for a long-term solution for categories >20k articles, but a one-off way to get 14- and 365- day pageviews from 28k en:Category:Low-importance Scotland articles would shut me up for a good while
Are Quarry, pagepiles etc subject to the same 20k limit? If not, what's the best way to use them?
I don't mind doing things manually, but don't need a fancy interface, CSV or JSON are fine
My preference would be for an API-style URL that I can request and get my CSV or JSON in return, and I'm happy for the article limit to be cut to 5000 or 1000 or whatever in that case, even if it has to be throttled to 1 or 5 minutes between requests, I'm patient.
But that needs some way of keeping track of where you are in a big category, it feels like category URLs with a ?from= parameter would be the easiest way to do that, but a cmcontinue equivalent would also work.

Cheers. FlagSteward (talk) 11:28, 22 May 2022 (UTC)

@FlagSteward This can do to help You. ✍️ Dušan Kreheľ (talk) 13:34, 22 May 2022 (UTC)

Thanks @Dušan Kreheľ:, I was aware of the dumps, I was rather hoping not to have to use them, mostly because the enwiki one is >30GB unzipped and I'm not sure my computer will cope, and I haven't really got a workflow for it. I've opened one of the small ones just to look at the format and that was fine just opening in Excel with text-to-columns, is the enwiki one still just one very big text file or is it split up? What's the recommended way of processing it? FlagSteward (talk) 21:55, 22 May 2022 (UTC)

@FlagSteward You have the more ways:

I think, the best way is to have the computer program with you select the category or pages and he downloaded online or exported from my dump the data in the output format as my pageview dump export. Like You this way? I could create a program.

Enwiki is not split.

Your computer is not weak, in Your job, if You use the right technologies. Dušan Kreheľ (talk) 08:33, 23 May 2022 (UTC)

Thanks the offer @Dušan Kreheľ:, but I should be OK, although I'm sure a program of the kind you suggest would be useful to others. At the moment I'm just trying to understand my options (and perhaps give @MusikAnimal (WMF): a little nudge to eg get ?from= parameters working in Massview <g>). I didn't know that you could now get pageviews through the main Wikimedia API - am I right in thinking it won't let you pipe articles to get pageviews for more than one at a time? I tried it with the online test and pipes didn't seem to work.

As background, I'm trying to revive a workflow which I've not used since grok.se disappeared but which uses a couple of PHP scripts that originate from 15+ years ago. So they were created in the days when a lot of this stuff had to be done by HTML scraping, although they've been updated in places to use the API for on-wiki stuff (but not pageviews). But the pageviews were done by scraping grok.se at 1 article per second overnight - not ideal, but it worked. The PHP generates a couple of CSVs which I then manually put together and review, and then the final CSV ends up as the input to an AWB bot. I'll probably end up rewriting the PHP as a single Python script at some point, but I have limited time to spend on it at the moment so I'd rather do it in ways that fit with my existing code rather than doing something completely new.

It would probably be useful for me anyway to write a function getting pageviews from the Wiki API as a direct drop-in replacement for my grok.se function. Given that my workflow is not fully automated anyway, then I don't mind doing 6 massviews to get the pageviews for eg WPScotland, apart from 20-to-28k of the Low-importance articles, and then using my script to get pageviews for that last 8k from the API - at 1 per second it'll take just over 2 hours. It's messy, and it's not ideal in terms of server load, but it's not going to be something I do very often, and it's probably the most efficient from the point of view of my time spent coding. I will have a look at the dump though, and see how my computer copes with it.FlagSteward (talk) 16:10, 23 May 2022 (UTC)

Please add clearer explanation on how to get page views from other Wikimedia projects eg Wikidata

Latest comment: 2 years ago2 comments2 people in discussion

Hi

I'm trying to find some information on how useful documentation I've written on Wikidata has been, I'm trying to use pageviews to get a number of views. When I use Wikidata as the site it says its not supported and then links me to a giant page of code https://meta.wikimedia.org/w/api.php?action=sitematrix&formatversion=2 . Please can someone explain how to get page views of documentation pages on Wikidata and include it in the instructions?

Thanks

John Cummings (talk) 10:00, 31 May 2022 (UTC)

@John Cummings You need to use the domain name, in this case "wikidata.org" the wub "?!" 16:32, 3 June 2022 (UTC)