Jump to content

List of Wikipedias by sample of articles/Source code (original)

From Meta, a Wikimedia project coordination wiki

This is the source code I'm currently using for the statistics at the List of Wikipedias by sample of articles. A few suggestions for improvement have already been made on the talk page -- I'll work on a new version when I have time. Of course, other interested people are welcome to work on it too.

The script is very simple and straightforward (NB: the updated page names for the List of articles every Wikipedia should have at Meta are not in this script; rather they are listed in the outdated file called 'yegedalised.txt' = list of articles. The current contents of this file, now being used for updates, are listed after the source code). Note that the variable names and messages are in Volapük. If this makes understanding difficult, please feel free to contact me. If there are also any obvious problems or bugs I have missed, please let me know.) --Smeira 16:37, 29 November 2007 (UTC)[reply]

Note: there are only five Wikipedias in pukalised; of course, every time I run the script I change them into the ones I want to look at now.


# -*- coding: utf_8 -*-
import sys
sys.path.append('c:\\Sergio\\Python2.5\\pywikipedia')
import wikipedia
import pagegenerators
import catlib
lingl = wikipedia.Site('en', 'wikipedia')
pukalised = ['en', 'de', 'fr', 'it', 'ja']
pukataib = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
sekataib = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
npuks = len(pukalised)
pukanum = 1
for puk in pukalised:
    pukavuk = wikipedia.Site(puk, 'wikipedia')
    if puk == 'en': pukavuk = lingl
    ragiv = open("C:\\Sergio\\Python2.5\\pywikipedia\\xxxfiles\\yegedalised.txt")
    yegedanum = 0
    zaned = 0
    sekataib[pukanum] = [0, 0, 0, 0, 0]
    print u'\n======================\nPÜKAMALAT: ',puk,'\n======================\n'
    for lien in ragiv:
        yeged = lien[:-1].decode('cp1252')
        pukataib[0].insert(yegedanum, yeged)
        linglapad = wikipedia.Page(lingl, yeged)
        linglavodem = linglapad.get()
        pukavodem = linglavodem
        if puk != 'en':
            plad1 = linglavodem.find(u'[[' + puk + u':')
            if plad1 > 0:
                plad1 = linglavodem.find(u':', plad1)
                pladf = linglavodem.find(u']]', plad1)
                pukayeged = linglavodem[plad1+1:pladf]
                pukapad = wikipedia.Page(puk, pukayeged)
                pukavodem = pukapad.get(get_redirect=True)
                if pukavodem.find(u'#REDIRECT') > -1:
                    plad1 = pukavodem.find('[[')
                    pladf = pukavodem.find(']]', plad1)
                    pukayeged = pukavodem[plad1+2:pladf]
                    pukapad = wikipedia.Page(puk, pukayeged)
                    pukavodem = pukapad.get()
            else:
                pukavodem = ''
        gretot = len(pukavodem)
        pukataib[pukanum].insert(yegedanum, gretot)
        if gretot == 0:
            sekataib[pukanum][0] = sekataib[pukanum][0] + 1
        elif gretot > 0 and gretot < 10000:
            sekataib[pukanum][1] = sekataib[pukanum][1] + 1
        elif gretot > 10000 and gretot < 30000:
            sekataib[pukanum][2] = sekataib[pukanum][2] + 1
        elif gretot > 30000:
            sekataib[pukanum][3] = sekataib[pukanum][3] + 1
        print pukataib[0][yegedanum], pukataib[pukanum][yegedanum]
        zaned = zaned + gretot
        yegedanum = yegedanum + 1
    ragiv.close()
    sekataib[pukanum][4] = int(zaned / (yegedanum-1))
    print sekataib[pukanum]
    pukanum = pukanum + 1
print '\n\n'
print 'SEKATAIB PEKALKULON.\n------- ------------\n\n'
print u'Pük:',' N/Db ',' <10k ','10-30k',' >30k '
for puk in range(npuks):
    grad = 0
    print pukalised[puk].ljust(4),
    for num in range(4):
        volad = sekataib[puk+1][num]
        print str(volad).rjust(6),
        grad = grad + volad*((num)**2)
    print ' .... ', grad, ' (yegedagretot zanedik: ', sekataib[puk+1][4],' b).' 
print
print 'KALKULAM EFINIKON...'
print

vtaib = '{|border="1" cellpadding="2" cellspacing="0" style="width:100%; background: #f9f9f9; border: 1px solid #aaaaaa; border-collapse: collapse; white-space: nowrap; text-align: center"'
vtaib = vtaib + '\n|-\n'
vtaib = vtaib + u'!width = 15 | № !! width = 25 | Lang. !! width = 150 | Average Article Size<br>(chars) !! width = 70 | Absent<br>(0k) !! width=70| Stubs<br>(< 10k)!! width = 70 | Articles<br>(10-30k) !! width = 70 | Long Art.<br>(> 30k) !! Score'
vtaib = vtaib + '\n|-\n'
for puk in range(npuks):
    grad = 0
    vtaib = vtaib + '|' + str(puk+1) + '\n'
    vtaib = vtaib + '| [[:' + pukalised[puk] + ':|' + pukalised[puk] + ']]\n'
    vtaib = vtaib + '| ' + str(sekataib[puk+1][4]) + '\n'
    for num in range(4):
        volad = sekataib[puk+1][num]
        vtaib = vtaib + '| ' + str(volad) + '\n'
        grad = grad + volad*((num)**2)
    vtaib = vtaib + '| ' + str(grad) + '\n|-\n'
vtaib = vtaib[:-2] + '}'
print vtaib

List of articles

[edit]

Here is the content of the yegedalised.txt (list of articles) file.

Brigitte Bardot
Sarah Bernhardt
Marlon Brando
Charlie Chaplin
Marlene Dietrich
Marx Brothers
Marilyn Monroe
Sandro Botticelli
Pieter Bruegel the Elder
Le Corbusier
Leonardo da Vinci
Salvador Dalí
Donatello
Albrecht Dürer
Vincent van Gogh
Francisco Goya
Frida Kahlo
Henri Matisse
Michelangelo
Pablo Picasso
Jackson Pollock
Raphael
Rembrandt
Diego Velázquez
Andy Warhol
Frank Lloyd Wright
Peter Paul Rubens
Abu Nuwas
Arnaut Daniel
Matsuo_Bash%C5%8D
Samuel Beckett
Jorge Luis Borges
George Gordon Byron, 6th Baron Byron
Luís de Camões
Miguel de Cervantes
Geoffrey Chaucer
Anton Chekhov
Dante Alighieri
Rubén Darío
Charles Dickens
Fyodor Dostoevsky
Ferdowsi
Fuzûlî
Gabriel García Márquez
Johann Wolfgang von Goethe
Homer
Horace
Victor Hugo
Henrik Ibsen
James Joyce
Franz Kafka
K%C4%81lid%C4%81sa
Omar Khayyám
Li Bai
Naguib Mahfouz
John Milton
Molière
Vladimir Nabokov
Ovid
Edgar Allan Poe
Munshi Premchand
Marcel Proust
Alexander Pushkin
Arthur Rimbaud
Shota Rustaveli
José Saramago
Sappho
William Shakespeare
Sophocles
Snorri Sturluson
J. R. R. Tolkien
Leo Tolstoy
Mark Twain
Virgil
Oscar Wilde
Wu Cheng'en
William Butler Yeats
Johann Sebastian Bach
The Beatles
Ludwig van Beethoven
Hector Berlioz
Anton Bruckner
Johannes Brahms
Pyotr Ilyich Tchaikovsky
Frédéric Chopin
Anton%C3%ADn Dvo%C5%99%C3%A1k
George Frideric Handel
Jimi Hendrix
Michael Jackson
Madonna (entertainer)
Gustav Mahler
Wolfgang Amadeus Mozart
Giacomo Puccini
Elvis Presley
The Rolling Stones
Franz Schubert
Bed%C5%99ich Smetana
Robert Schumann
Jean Sibelius
Igor Stravinsky
Giuseppe Verdi
Antonio Vivaldi
Richard Wagner
Roald Amundsen
Neil Armstrong
Jacques Cartier
Christopher Columbus
James Cook
Hernán Cortés
Yuri Gagarin
Vasco da Gama
Ferdinand Magellan
Marco Polo
Zheng He
Alexander von Humboldt
Ingmar Bergman
Walt Disney
Federico Fellini
Alfred Hitchcock
Stanley Kubrick
Akira Kurosawa
George Lucas
Steven Spielberg
Archimedes
Alexander Graham Bell
Tim Berners-Lee
Tycho Brahe
Nicolaus Copernicus
Marie Curie
Charles Darwin
Thomas Edison
Albert Einstein
Euclid
Leonhard Euler
Michael Faraday
Enrico Fermi
Fibonacci
Henry Ford
Joseph Fourier
Galileo Galilei
Carl Friedrich Gauss
Johannes Gutenberg
Ernst Haeckel
James Prescott Joule
Johannes Kepler
John Maynard Keynes
Muhammad ibn M%C5%ABs%C4%81 al-Khw%C4%81rizm%C4%AB
Gottfried Leibniz
Carl Linnaeus
James Clerk Maxwell
Dmitri Mendeleev
Antonio Meucci
Isaac Newton
Blaise Pascal
Louis Pasteur
Max Planck
Ernest Rutherford
Erwin Schrödinger
Richard Stallman
Nikola Tesla
Alan Turing
James Watt
Wright brothers
Thomas Aquinas
Aristotle
Augustine of Hippo
Avicenna
Giordano Bruno
Simone de Beauvoir
Noam Chomsky
René Descartes
Émile Durkheim
Francis of Assisi
Sigmund Freud
Georg Wilhelm Friedrich Hegel
Herodotus
Hippocrates
Immanuel Kant
John Locke
Martin Luther
Rosa Luxemburg
Niccolò Machiavelli
Karl Marx
Friedrich Nietzsche
Paul the Apostle
Plato
Pythagoras
Jean-Jacques Rousseau
Jean-Paul Sartre
Adam Smith
Socrates
Sun Tzu
Voltaire
Max Weber
Ludwig Wittgenstein
Akbar the Great
Alexander the Great
Mustafa Kemal Atatürk
Augustus
David Ben-Gurion
Otto von Bismarck
Simón Bolívar
Napoleon I of France
George W. Bush
Julius Caesar
Charlemagne
Winston Churchill
Empress Dowager Cixi
Cleopatra VII
Constantine I
Charles de Gaulle
Indira Gandhi
Elizabeth I of England
Genghis Khan
Haile Selassie I of Ethiopia
Hirohito
Adolf Hitler
Vladimir Lenin
Louis XIV of France
Nelson Mandela
Mao Zedong
Benito Mussolini
Kwame Nkrumah
Peter I of Russia
Qin Shi Huang
Saladin
Joseph Stalin
Margaret Thatcher
Harry S. Truman
Victoria of the United Kingdom
George Washington
Abraham
Moses
Jesus
Muhammad
Gautama Buddha
Osama bin Laden
Mohandas Karamchand Gandhi
Emma Goldman
Joan of Arc
Helen Keller
Martin Luther King, Jr.
Mother Teresa
Florence Nightingale
Rosa Parks
Che Guevara
History
Prehistory
Stone Age
Bronze Age
Iron Age
Mesopotamia
Ancient Egypt
Ancient Greece
Roman Empire
Age of Enlightenment
Aztec
Byzantine Empire
Crusades
Holy Roman Empire
Hundred Years' War
Middle Ages
Mongol Empire
Ming Dynasty
Ottoman Empire
Protestant Reformation
Renaissance
Thirty Years' War
Viking
American Civil War
History of South Africa in the Apartheid era
British Empire
Cold War
French Revolution
Great Depression
Gulf War
The Holocaust
Industrial Revolution
Korean War
Nazi Germany
Russian Revolution (1917)
Qing Dynasty
Spanish Civil War
Treaty of Versailles
Vietnam War
World War I
World War II
Geography
Capital
City
Continent
Country
Desert
Earth science
Map
North Pole
Ocean
Rainforest
River
Sea
South Pole
Africa
Antarctica
Asia
Europe
Latin America
Middle East
North America
Oceania
South America
Afghanistan
Algeria
Argentina
Australia
Austria
Bangladesh
Belgium
Brazil
Canada
China
People's Republic of China
Democratic Republic of the Congo
Egypt
Ethiopia
France
Germany
Greece
India
Indonesia
Iran
Iraq
Republic of Ireland
Israel
Italy
Japan
Mexico
Netherlands
Pakistan
Poland
Russia
Saudi Arabia
Singapore
South Africa
South Korea
Spain
Sudan
Switzerland
Tanzania
Thailand
Turkey
Ukraine
United Kingdom
United States
Vietnam
Portugal
Amsterdam
Athens
Baghdad
Bangkok
Beijing
Beirut
Berlin
Brisbane
Brussels
Buenos Aires
Cairo
Canberra
Cape Town
Chicago
Damascus
Dar es Salaam
Dublin
Edinburgh
Florence
Hong Kong
Istanbul
Jakarta
Jerusalem
Karachi
Kyoto
Los Angeles, California
London
Mecca
Melbourne
Mexico City
Milan
Moscow
Mumbai
Nairobi
Naples
New Delhi
New York City
Paris
Rio de Janeiro
Rome
Seoul
Shanghai
Singapore
Sydney
Tehran
Tel Aviv
Tokyo
Venice
Vienna
Washington, D.C.
Amazon River
Aral Sea
Arctic Ocean
Atlantic Ocean
Baltic Sea
Black Sea
Caribbean Sea
Caspian Sea
Congo River
Danube
Dead Sea
Euphrates
Ganges
Great Barrier Reef
Great Lakes
Indian Ocean
Indus River
Lake Baikal
Lake Tanganyika
Lake Titicaca
Lake Victoria
Mediterranean Sea
Mississippi River
Niagara Falls
Niger River
Nile
North Sea
Pacific Ocean
Panama Canal
Rhine
Suez Canal
Southern Ocean
Tigris
Volga River
Yangtze River
Alps
Andes
Himalayas
Mount Kilimanjaro
Mount Everest
Rocky Mountains
Sahara
Society
Civilization
Education
Family
Child
Man
Marriage
Woman
Behavior
Emotion
Love
Thought
Politics
Anarchism
Colonialism
Communism
Conservatism
Democracy
Dictatorship
Diplomacy
Fascism
Globalization
Government
Ideology
Imperialism
Liberalism
Marxism
Monarchy
Nationalism
Nazism
Republic
Socialism
State
Political party
Propaganda
Economics
Macroeconomics
Microeconomics
Agriculture
Capital (economics)
Capitalism
Currency
Euro
Japanese yen
United States dollar
Industry
Money
Tax
Law
Constitution
African Union
Arab League
Association of Southeast Asian Nations
Commonwealth of Independent States
Commonwealth of Nations
European Union
International Red Cross and Red Crescent Movement
NATO
Nobel Prize
OPEC
United Nations
International Atomic Energy Agency
International Court of Justice
International Monetary Fund
UNESCO
Universal Declaration of Human Rights
World Health Organization
World Bank Group
World Trade Organization
Civil war
Military
Peace
War
Abortion
Capital punishment
Human rights
Racism
Slavery
Culture
Art
Comics
Painting
Photography
Sculpture
Pottery
Dance
Fashion
Theatre
Cannes Film Festival
Language
Alphabet
Chinese character
Cyrillic alphabet
Greek alphabet
Latin alphabet
Letter (alphabet)
Grammar
Noun
Syntax
Verb
Linguistics
Literacy
Literature
Prose
Fiction
Novel
One Thousand and One Nights
Poetry
Epic of Gilgamesh
Iliad
Mah%C4%81bh%C4%81rata
Ramayana
Pronunciation
Arabic language
Bengali language
Chinese language
English language
Esperanto
French language
German language
Greek language
Hebrew language
Hindi
Interlingua
Italian language
Japanese language
Latin
Persian language
Russian language
Sanskrit
Spanish language
Tamil language
Turkish language
Word
Writing
Architecture
Arch
Bridge
Canal
Dam
Dome
House
Aswan Dam
Colosseum
Great Wall of China
Eiffel Tower
Empire State Building
Hagia Sophia
Parthenon
Giza pyramid complex
St. Peter's Basilica
Taj Mahal
Pyramid
Tower
Film
Animation
Radio
Television
Music
Song
Blues
Classical music
Opera
Symphony
Electronic music
Folk music
Jazz
Pop music
Reggae
Rhythm and blues
Rock and roll
Hard rock
New Age music
Drum
Flute
Guitar
Piano
Trumpet
Violin
Game
Backgammon
Chess
Go (board game)
Playing card
Gambling
Martial arts
Judo
Karate
Olympic Games
Sport
American football
Auto racing
Badminton
Baseball
Basketball
Cricket
Fencing
Football (soccer)
Golf
Horse racing
Ice hockey
Tennis
Rugby union
Wrestling
Athletics (track and field)
Toy
Deity
God
Mythology
Atheism
Fundamentalism
Materialism
Monotheism
Polytheism
Soul
Religion
Bahá'í Faith
Buddhism 
Christianity
Roman Catholic Church
Confucianism
Hinduism
Islam
Jainism 
Judaism
Shinto
Sikhism
Taoism 
Haitian Vodou
Zoroastrianism
Spirituality
Philosophy
Beauty
Dialectic
Ethics (philosophy)
Epistemology
Feminism
Free will
Knowledge
Logic
Mind
Morality
Reality
Truth
Science
Astronomy
Asteroid
Big Bang
Black hole
Comet
Galaxy
Milky Way
Light-year
Moon
Planet
Earth
Jupiter
Mars
Mercury (planet)
Neptune
Saturn
Uranus
Venus
Solar System
Star
Sun
Universe
Biology
DNA
Enzyme
Protein
Botany
Death
Suicide
Ecology
Endangered species
Domestication
Life
Scientific classification
Species
Metabolism
Digestion
Photosynthesis
Respiration (physiology)
Evolution
Reproduction
Asexual reproduction
Sexual reproduction
Heterosexuality
Homosexuality
Pregnancy
Sex
Female
Male
Sexual intercourse
Anatomy
Cell
Circulatory system
Blood
Heart
Endocrine system
Gastrointestinal tract
Colon (anatomy)
Small intestine
Liver
Integumentary system
Breast
Skin
Muscle
Nervous system
Brain
Sensory system
Auditory system
Ear
Gustatory system
Olfactory system
Somatosensory system
Visual system
Eye
Reproductive system
Penis
Vagina
Respiratory system
Lung
Skeleton
Medicine
Addiction
Alcoholism
Drug addiction
Alzheimer's disease
Cancer
Cholera
Acute Viral Nasopharyngitis (Common Cold)
Dentistry
Disability
Blindness
Hearing impairment
Mental disorder
Disease
Medication
Ethanol
Nicotine
Tobacco
Drug
Health
Headache
Myocardial infarction
Heart disease
Malaria
Malnutrition
Obesity
Pandemic
Penicillin
Pneumonia
Poliomyelitis
Sexually transmitted disease
AIDS
Stroke
Tuberculosis
Diabetes mellitus
Virus
Influenza
Smallpox
Organism
Animal
Arthropod
Insect
Ant
Bee
Butterfly
Arachnid
Chordate
Amphibian
Frog
Bird
Columbidae
Fish
Shark
Mammal
Ape
Human
Camel
Cat
Cattle
Dog
Dolphin
Elephant
Horse
Domestic sheep
Lion
Pig
Whale
Reptile
Dinosaur
Snake
Archaea
Bacteria
Fungus
Plant
Flower
Tree
Protist
Chemistry
Biochemistry
Chemical compound
Acid
Base (chemistry)
Salt
Chemical element
List of elements by name
Periodic table
Aluminium
Carbon
Copper
Gold
Helium
Hydrogen
Iron
Neon
Nitrogen
Oxygen
Silver
Tin
Zinc
Metal
Alloy
Steel
Organic chemistry
Alcohol
Carbohydrate
Hormone
Lipid
Phase (matter)
Gas
Liquid
Plasma (physics)
Solid
Avalanche
Climate
El Niño-Southern Oscillation
Global warming
Earthquake
Geology
Mineral
Diamond
Plate tectonics
Rock (geology)
Natural disaster
Volcano
Weather
Cloud
Flood
Tsunami
Rain
Acid rain
Snow
Tornado
Tropical cyclone
Physics
Acceleration
Atom
Energy
Conservation of energy
Electromagnetic radiation
Infrared
Visible spectrum
Color
Ultraviolet
Gamma ray
Force
Electromagnetism
Gravitation
Nuclear force
Light
Magnet
Magnetic field
Mass
Molecule
Quantum mechanics
Sound
Speed
Speed of light
Speed of sound
Theory of relativity
Time
Velocity
Weight
Length
Anno Domini
Calendar
Gregorian calendar
Century
Day
Minute
Millennium
Month
Time zone
Daylight saving time
Week
Year
Technology
Biotechnology
Clothing
Cotton
Engineering
Lever
Pulley
Screw
Wedge (mechanical device)
Wheel
Irrigation
Plough
Metallurgy
Nanotechnology
Communication
Book
Information
Encyclopedia
Journalism
Newspaper
Mass media
Printing
Rail transport
Telephone
Mobile phone
Electronics
Electric current
Frequency
Capacitor
Inductor
Transistor
Diode
Resistor
Transformer
Computer
Hard disk drive
Processor
Random access memory
Artificial intelligence
Information technology
Algorithm
Internet
E-mail
World Wide Web
Web browser
Operating system
Programming language
Computer software
User interface
Keyboard (computing)
Computer display
Mouse (computing)
Energy (society)
Renewable energy
Electricity
Nuclear power
Fossil fuel
Internal combustion engine
Steam engine
Fire
Glass
Paper
Plastic
Wood
Transport
Aircraft
Automobile
Bicycle
Boat
Ship
Train
Weapon
Axe
Explosive material
Gunpowder
Firearm
Machine gun
Nuclear weapon
Sword
Tank
Food
Bread
Cereal
Barley
Maize
Oat
Rice
Rye
Sorghum
Wheat
Cheese
Chocolate
Honey
Fruit
Apple
Banana
Grape
Legume
Soybean
Lemon
Nut (fruit)
Meat
Sugar
Vegetable
Potato
Beer
Wine
Coffee
Milk
Tea
Water
Juice
Mathematics
Algebra
Arithmetic
Axiom
Calculus
Geometry
Circle
Pi
Square
Triangle
Group theory
Mathematical proof
Number
Complex number
Integer
Natural number
Prime number
Rational number
Infinity
Set theory
Statistics
Trigonometry
Measurement
Joule
Kilogram
Litre
Metre
Newton
International System of Units
Volt
Watt
Second
Kelvin