Wikistats csv
Statistics data created by Erik Zachte's Wikistats script are available in comma-separated values (CSV) format for nearly all Wikimedia wiki's (see section external links)
Single measures by language (column) and month (row)
[edit]Most of these files contain the same date as StatisticsMonthly.csv
(see column number in parenthesis). Additionally the second column contains
the sum of all languages (tot
).
WikipediansContributors.csv
(3)WikipediansNew.csv
(4)WikipediansEditsGt5.csv
(5)WikipediansEditsGt100.csv
(6)ArticlesTotal.csv
(7)ArticlesTotalAlt.csv
(8)ArticlesNewPerDay.csv
(9)ArticlesEditsPerArticle.csv
(10)ArticlesBytesPerArticle.csv
(11)ArticlesGt500Bytes.csv
(12, but rounded percentage)ArticlesGt1500Bytes.csv
(13?, but rounded percentage)DatabaseEdits.csv
(14)DatabaseSize.csv
(15)DatabaseWords.csv
(16)DatabaseLinks.csv
(17)DatabaseWikiLinks.csv
(18)DatabaseImageLinks.csv
(19)DatabaseExternalLinks.csv
(20)DatabaseRedirects.csv
(21)
and
UsagePageRequest.csv
UsageVisits.csv
Editor Counts
[edit]For (New) Contributors use file StatisticsMonthly.csv (see above). (Very) Active Editor numbers are no longer taken from that file, but rather from file StatisticsUserActivitySpread.csv, which has more categories of activity, and more levels of activity.
Data are presented in this table: http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editor_activity_levels (substitute /EN/ for any language code).
Example: 6 lines for English Wikipedia for Jan 2013:
en,01/31/2013,B,A,248,125,112,105,99,98,92,83,66,62,55,40,27,21,18,13,6,4,2 en,01/31/2013,B,T,94,73,70,62,58,57,54,40,37,33,30,24,11,7,7,5,1,1 en,01/31/2013,B,O,178,140,135,128,116,112,103,93,80,73,63,42,21,15,10,5,1,1 en,01/31/2013,R,A,117769,50673,33469,19393,9656,8049,5827,3414,1511,1155,673,235,52,36,16,5 en,01/31/2013,R,T,29310,13150,8985,5572,3030,2504,1814,1017,412,300,154,41,9,7,3,1 en,01/31/2013,R,O,31823,13213,9085,5640,3026,2526,1793,925,312,201,101,35,5,5,4,1
Perl example:
open "FILE_IN", "<", $file_csv_users_activity_spread ; while ($line = <FILE_IN>) { chomp ($line) ; # count user with over x edits # why those strange levels 3, 32, 316, etc ? # these are SQRT(10), 10xSQRT(10), 100xSQRT(10), 1000xSQRT(10), # for finer evenly spaced level in charts (316/100 is 1000/316) # thresholds =1,3,5,10,25,32,50,100,250,316,500,1000,2500,3162,5000,10000,25000,31623,50000,100000,etc # table "TablesWikipediaEN.htm#editor_activity_levels" doesn't show all levels ($language, $date, $reguser_bot, $ns_group, @fields) = split (",", $line) ; if ($reguser_bot ne "R") { next ; } # R: registered user, B: bot if ($ns_group ne "A") { next ; } # A: articles, T: talk pages, O: other $count_5 = $fields [2] ; $count_100 = $fields [7] ; }
Editor ranks
[edit]An often-used file is the CSV with the full rank of contributors (of which WikiStats tables show only the top 20 or top 50): StatisticsActiveUsers.csv, StatisticsSleepingUsers.csv, StatisticsUsers.csv (not updated since 2009); EditsPerUserWIKINAME.csv (not updated since 2012?).
Special statistics
[edit]Ploticus
[edit]InputPloticus_X.csv
with X = A,C,D,E,F,K,L,M,N,O,P,U
and InputPloticusTemp.csv
.
Categories
[edit]There is CategoriesXX.csv
for each language (replace XX with uppercase language code).
Other files
[edit]LanguageCodes.csv
contains one line for each language with:
- language code
- encoding (utf-8) this value is always utf-8, also for Wikipedias not supporting utf-8!
- category namespace prefix "
(Category)
" if not modified - image namespace prefix "
(Image)
" if not modified - user namespace prefix "
(User)
" if not modified
for instance
de,(utf-8),Kategorie,Bild,Benutzer
en,(utf-8),(Category),(Image),(User)
Bots.csv
contains one line for each language with a pipe-separated list of bots, e.g.:
en,AtteBot|ReaperBot
de,AtteBot|ReaperBot|GermanBot
Layout per file
[edit]- StatisticsEditsPerArticle.csv
- language code
- total no revisions
- total no revisions by registered users
- total size in bytes for all revisions together, uncompressed (explains why dump and xml files are so huge)
- number of unique registered contributors to this article
- number of unique ip addresses for anonymous users that contributed (not exactly the same as number of unique anonymous contributors, which can not be known)
- full title in UTF-8 unicode
Full list of available files
[edit]The full list of available files as of December 2013, "wikispecial", is as follows (WIKINAME stands for the name/language of the wiki in question, for wiki-specific files):
ArticlesBytesPerArticle.csv ArticlesEditsPerArticle.csv downPerUserPerMonthAllWikisTemp_6_WIKINAME.csv EditsBreakdownPerUserPerMonthAllWikisTemp_7_WIKINAME.csv EditsBreakdownPerUserPerMonthWikiLovesMonumentsUploaders.csv EditsBreakdownPerUserPerMonthWIKINAME.csv EditsBreakdownPerUserPerMonthWIKINAMEemp.csv EditsBreakdownPerUserPerMonthZZarticles.csv EditsBreakdownPerUserPerMonthZZ.csv EditsBreakdownPerUserPerMonthZZmerged.csv EditsBreakdownPerUserPerMonthZZsorted.csv EditsBreakdownPerUserPerMonthZZTemp.csv EditsPerArticle.csv EditsPerArticleWIKINAME.csv EditsPerUser.csv EditsPerUserWIKINAME.csv EditsTimestampsWIKINAME.csv InputPloticus_A.csv InputPloticus_C.csv InputPloticus_D.csv InputPloticus_E.csv InputPloticus_F.csv InputPloticus_K.csv InputPloticus_L.csv InputPloticus_M.csv InputPloticus_N.csv InputPloticus_O.csv InputPloticus_P.csv InputPloticusTemp.csv LanguageCodes.csv LanguageNamesViaPhp.csv LanguageNamesViaWpEn.csv LanguageNamesViaWpEnEdited.csv ME.csv Namespaces.csv N.csv PageViewsGrowthLastYear.csv PageViewsPerDayAll.csv PageViewsPerHourAll.csv PageViewsPerMonthAll.csv PageViewsPerMonthAllNormalized.csv PageViewsPerMonthAllTotalled.csv PageViewsPerMonthHtmlAllProjects.csv PageViewsPerWeekAll.csv PageViewsPerWeekdayAll.csv Participation.csv R_PlotData_Binaries.csv R_PlotData_Editors.csv R_PlotData_NewArticles.csv R_PlotData_PageViews.csv R_PlotData_Uploaders.csv R_PlotData_Uploads.csv StatisticsAccessLevels.csv StatisticsActiveUsers.csv StatisticsAnonymousUsers.csv StatisticsBots.csv StatisticsEditDistribution.csv StatisticsEditorsPerWiki.csv StatisticsEditsPerDay.csv StatisticsEditsPerNamespace.csv StatisticsEditsPerUsertype.csv StatisticsLog.csv StatisticsLogRunTime.csv StatisticsMonthly.csv StatisticsNewestDumps.csv StatisticsPageviewsPerWiki.csv StatisticsPerBinariesExtension.csv StatisticsPerNamespace.csv StatisticsPlotBinariesPerWiki.csv StatisticsPlotInput.csv StatisticsRevertsPerMonth.csv StatisticsSizeDistribution.csv StatisticsSleepingUsers.csv StatisticsTimelines.csv StatisticsUploadsPerWiki.csv StatisticsUserActivitySpread.csv StatisticsUsers.csv StatisticsWeekly.csv TempEditsPerArticle.csv TranslateWiki.csv UserActivityTrendsAllProjectsZZ.csv UserActivityTrendsEN.csv UserActivityTrendsNewBinariesWIKINAME.csv UserActivityTrendsTopUploadersWIKINAME.csv UserActivityTrendsUploadWizardWIKINAME.csv UserActivityTrendsWIKINAME.csv UserActivityTrendsZZ.csv WhiteListWikis.csv WikimediaGrowthStats.csv WikipediansContributors.csv WikipediansEditsGt100.csv WikipediansEditsGt5.csv WikipediansNew.csv ZeitGeist.csv ZeitGeist.csvBreakdownPerUserPerMonthWIKINAMEemp.csv
External links
[edit]- Statistics are in csv format (can be imported in spreadsheet)
- Index of all data files,
Wikibooks, Wikinews, Wikipedia, Wikiquote, Wikisource, Wikiversity, Wikivoyage, Other projects, incl. Commons WikiData and Meta, Wiktionary