Jump to content

Wikistats csv

From Meta, a Wikimedia project coordination wiki

Statistics data created by Erik Zachte's Wikistats script are available in comma-separated values (CSV) format for nearly all Wikimedia wiki's (see section external links)


Single measures by language (column) and month (row)

[edit]

Most of these files contain the same date as StatisticsMonthly.csv (see column number in parenthesis). Additionally the second column contains the sum of all languages (tot).

  • WikipediansContributors.csv (3)
  • WikipediansNew.csv (4)
  • WikipediansEditsGt5.csv (5)
  • WikipediansEditsGt100.csv (6)
  • ArticlesTotal.csv (7)
  • ArticlesTotalAlt.csv (8)
  • ArticlesNewPerDay.csv (9)
  • ArticlesEditsPerArticle.csv (10)
  • ArticlesBytesPerArticle.csv (11)
  • ArticlesGt500Bytes.csv (12, but rounded percentage)
  • ArticlesGt1500Bytes.csv (13?, but rounded percentage)
  • DatabaseEdits.csv (14)
  • DatabaseSize.csv (15)
  • DatabaseWords.csv (16)
  • DatabaseLinks.csv (17)
  • DatabaseWikiLinks.csv (18)
  • DatabaseImageLinks.csv (19)
  • DatabaseExternalLinks.csv (20)
  • DatabaseRedirects.csv (21)

and

  • UsagePageRequest.csv
  • UsageVisits.csv

Editor Counts

[edit]

For (New) Contributors use file StatisticsMonthly.csv (see above). (Very) Active Editor numbers are no longer taken from that file, but rather from file StatisticsUserActivitySpread.csv, which has more categories of activity, and more levels of activity.

Data are presented in this table: http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editor_activity_levels (substitute /EN/ for any language code).

Example: 6 lines for English Wikipedia for Jan 2013:

en,01/31/2013,B,A,248,125,112,105,99,98,92,83,66,62,55,40,27,21,18,13,6,4,2
en,01/31/2013,B,T,94,73,70,62,58,57,54,40,37,33,30,24,11,7,7,5,1,1
en,01/31/2013,B,O,178,140,135,128,116,112,103,93,80,73,63,42,21,15,10,5,1,1
en,01/31/2013,R,A,117769,50673,33469,19393,9656,8049,5827,3414,1511,1155,673,235,52,36,16,5 
en,01/31/2013,R,T,29310,13150,8985,5572,3030,2504,1814,1017,412,300,154,41,9,7,3,1
en,01/31/2013,R,O,31823,13213,9085,5640,3026,2526,1793,925,312,201,101,35,5,5,4,1

Perl example:

open "FILE_IN", "<", $file_csv_users_activity_spread ;
while ($line = <FILE_IN>)
{
  chomp ($line) ;
  # count user with over x edits
  # why those strange levels 3, 32, 316, etc ? 
  # these are SQRT(10), 10xSQRT(10), 100xSQRT(10), 1000xSQRT(10),
  # for finer evenly spaced level in charts (316/100 is 1000/316)
  # thresholds =1,3,5,10,25,32,50,100,250,316,500,1000,2500,3162,5000,10000,25000,31623,50000,100000,etc
  # table "TablesWikipediaEN.htm#editor_activity_levels" doesn't show all levels 
  ($language, $date, $reguser_bot, $ns_group, @fields) = split (",", $line) ;
  if ($reguser_bot ne "R") { next ; } # R: registered user, B: bot
  if ($ns_group    ne "A") { next ; } # A: articles, T: talk pages, O: other
  $count_5   = $fields [2] ;
  $count_100 = $fields [7] ;
}

Editor ranks

[edit]

An often-used file is the CSV with the full rank of contributors (of which WikiStats tables show only the top 20 or top 50): StatisticsActiveUsers.csv, StatisticsSleepingUsers.csv, StatisticsUsers.csv (not updated since 2009); EditsPerUserWIKINAME.csv (not updated since 2012?).

Special statistics

[edit]

Ploticus

[edit]

InputPloticus_X.csv with X = A,C,D,E,F,K,L,M,N,O,P,U and InputPloticusTemp.csv.

Categories

[edit]

There is CategoriesXX.csv for each language (replace XX with uppercase language code).

Other files

[edit]

LanguageCodes.csv contains one line for each language with:

  1. language code
  2. encoding (utf-8) this value is always utf-8, also for Wikipedias not supporting utf-8!
  3. category namespace prefix "(Category)" if not modified
  4. image namespace prefix "(Image)" if not modified
  5. user namespace prefix "(User)" if not modified

for instance

de,(utf-8),Kategorie,Bild,Benutzer
en,(utf-8),(Category),(Image),(User)

Bots.csv contains one line for each language with a pipe-separated list of bots, e.g.:

en,AtteBot|ReaperBot
de,AtteBot|ReaperBot|GermanBot

Layout per file

[edit]
  • StatisticsEditsPerArticle.csv
  1. language code
  2. total no revisions
  3. total no revisions by registered users
  4. total size in bytes for all revisions together, uncompressed (explains why dump and xml files are so huge)
  5. number of unique registered contributors to this article
  6. number of unique ip addresses for anonymous users that contributed (not exactly the same as number of unique anonymous contributors, which can not be known)
  7. full title in UTF-8 unicode

Full list of available files

[edit]

The full list of available files as of December 2013, "wikispecial", is as follows (WIKINAME stands for the name/language of the wiki in question, for wiki-specific files):

ArticlesBytesPerArticle.csv
ArticlesEditsPerArticle.csv
downPerUserPerMonthAllWikisTemp_6_WIKINAME.csv
EditsBreakdownPerUserPerMonthAllWikisTemp_7_WIKINAME.csv
EditsBreakdownPerUserPerMonthWikiLovesMonumentsUploaders.csv
EditsBreakdownPerUserPerMonthWIKINAME.csv
EditsBreakdownPerUserPerMonthWIKINAMEemp.csv
EditsBreakdownPerUserPerMonthZZarticles.csv
EditsBreakdownPerUserPerMonthZZ.csv
EditsBreakdownPerUserPerMonthZZmerged.csv
EditsBreakdownPerUserPerMonthZZsorted.csv
EditsBreakdownPerUserPerMonthZZTemp.csv
EditsPerArticle.csv
EditsPerArticleWIKINAME.csv
EditsPerUser.csv
EditsPerUserWIKINAME.csv
EditsTimestampsWIKINAME.csv
InputPloticus_A.csv
InputPloticus_C.csv
InputPloticus_D.csv
InputPloticus_E.csv
InputPloticus_F.csv
InputPloticus_K.csv
InputPloticus_L.csv
InputPloticus_M.csv
InputPloticus_N.csv
InputPloticus_O.csv
InputPloticus_P.csv
InputPloticusTemp.csv
LanguageCodes.csv
LanguageNamesViaPhp.csv
LanguageNamesViaWpEn.csv
LanguageNamesViaWpEnEdited.csv
ME.csv
Namespaces.csv
N.csv
PageViewsGrowthLastYear.csv
PageViewsPerDayAll.csv
PageViewsPerHourAll.csv
PageViewsPerMonthAll.csv
PageViewsPerMonthAllNormalized.csv
PageViewsPerMonthAllTotalled.csv
PageViewsPerMonthHtmlAllProjects.csv
PageViewsPerWeekAll.csv
PageViewsPerWeekdayAll.csv
Participation.csv
R_PlotData_Binaries.csv
R_PlotData_Editors.csv
R_PlotData_NewArticles.csv
R_PlotData_PageViews.csv
R_PlotData_Uploaders.csv
R_PlotData_Uploads.csv
StatisticsAccessLevels.csv
StatisticsActiveUsers.csv
StatisticsAnonymousUsers.csv
StatisticsBots.csv
StatisticsEditDistribution.csv
StatisticsEditorsPerWiki.csv
StatisticsEditsPerDay.csv
StatisticsEditsPerNamespace.csv
StatisticsEditsPerUsertype.csv
StatisticsLog.csv
StatisticsLogRunTime.csv
StatisticsMonthly.csv
StatisticsNewestDumps.csv
StatisticsPageviewsPerWiki.csv
StatisticsPerBinariesExtension.csv
StatisticsPerNamespace.csv
StatisticsPlotBinariesPerWiki.csv
StatisticsPlotInput.csv
StatisticsRevertsPerMonth.csv
StatisticsSizeDistribution.csv
StatisticsSleepingUsers.csv
StatisticsTimelines.csv
StatisticsUploadsPerWiki.csv
StatisticsUserActivitySpread.csv
StatisticsUsers.csv
StatisticsWeekly.csv
TempEditsPerArticle.csv
TranslateWiki.csv
UserActivityTrendsAllProjectsZZ.csv
UserActivityTrendsEN.csv
UserActivityTrendsNewBinariesWIKINAME.csv
UserActivityTrendsTopUploadersWIKINAME.csv
UserActivityTrendsUploadWizardWIKINAME.csv
UserActivityTrendsWIKINAME.csv
UserActivityTrendsZZ.csv
WhiteListWikis.csv
WikimediaGrowthStats.csv
WikipediansContributors.csv
WikipediansEditsGt100.csv
WikipediansEditsGt5.csv
WikipediansNew.csv
ZeitGeist.csv
ZeitGeist.csvBreakdownPerUserPerMonthWIKINAMEemp.csv
[edit]

Wikibooks, Wikinews, Wikipedia, Wikiquote, Wikisource, Wikiversity, Wikivoyage, Other projects, incl. Commons WikiData and Meta, Wiktionary