Jump to content

Wikimedia Quarto/2/tech/simple

From Meta, a Wikimedia project coordination wiki

+/-

Technical Development
Technical Development

Most of the below report has been written by James Day; the part on the Paris machines is largely from David Monniaux.
Information about our servers may be found any time at Wikimedia servers. Developer activity falls into two main areas: server maintenance and development of the MediaWiki software, which is also used for many non-Wikimedia applications. Most developers (though not all, by their choice) are listed here. One may show appreciation of their dedication by thank you notes or financial support. Thank you !
Until now, all developers have been working for free, but that may change in the future to support our amazing growth.

Installation of Squid caches in France

[edit]
The cluster near Paris.
Our servers are the three in the middle:
(from top to bottom: bleuenn, chloe, ennael.)


On December 18, 2004, 3 donated servers were installed at a colocation facility in Aubervilliers, a suburb of Paris, France They are named bleuenn, chloe, ennael by donor request. For the technically-minded, the machines are HP sa1100 1U servers with 640 MiB of RAM, 20 GB ATA hard disks, and 600 MHz Celeron processors.

The machines are to be equipped with Squid caching software. They will be a testbed for the technique of adding Web caches nearer to users in order to reduce latency. Typically, users in France on DSL Internet connections can connect to these machines with a 30 ms latency, while they connect to the main cluster of Wikimedia servers in Florida in about 140 ms. The idea is that users from parts of Europe will use the Squid caches in France, to reduce by 1/10 second, access delays both for multimedia content for all users and for page content for anonymous users. Logged-in users will not profit as much, since pages are generated specifically for them and, thus, are not cached across users. If a page is not in a Squid cache, or a page is for a logged in user, the Apache web servers must take 1/5 to 3 or more seconds plus database time to make the page. Database time is about 1/20 second for simple things but can be many seconds for categories or even 100 seconds for a very big watchlist.

The Telecity data center
The Telecity data center

The Squid caches were activated in early January 2005, and some experiment period ensued. As of January 31, the machines cache English, French and multimedia content for Belgium, France, Luxembourg, Switzerland and the United Kingdom. The system is still somewhat experimental, and it is expected that caching performance could be increased with some tuning. The installation of similar caching clusters in other countries is being considered.

Installation of more servers in Florida

[edit]

In mid-October, two more dual Opteron database slave servers, with 6 drives in RAID 0 and 4GB of RAM, plus five 3GHz/1GB RAM Apache servers were ordered. Delays, due to compatibility problems, which the vendor had to resolve before shipping the database servers, left the site short of database power; until early December, the search function had to be turned off, at times.

In November 2004, five Web servers, four with high RAM (working memory) capacity used for Memcached or Squid caching, experienced failures. This resulted in very slow Wikis, at times.

Five 3GHz/3GB RAM servers were ordered in early December. Four of the December machines will provide Squid and Memcached service as improved replacements for the failing machines, until they are repaired. One machine with SATA drives in RAID 0 will be used as a testbed to see how much load such less costly database servers might be able to handle, as well as providing another option for a backup-only database slave also running Apache. These machines are equipped with a new option for a remote power and server health monitoring board at $60 extra cost. This option was taken for this order, to allow a comparison of the effectiveness of this monitoring board with a remote power strip and more limited monitoring tools. Remote power and health reporting helps to reduce the need for colocation facility labor, which can sometimes involve costs and/or delays.

A further order of one master database server, to permit a split of the database servers into two sets of a master and pair of slaves, with each set holding about half of the project activity, as well as, five more Apaches is planned for the end of the quarter or the first days of the next quarter. This order will use the remainder of the US$50,000 from the last fundraising drive. The database server split will allow the halving of the amount of disk writing each set must do, leaving more capacity for the disk reads needed to serve user requests. This split is intended to happen in about three months, after the new master has proved its reliability during several months of service as a database slave.

Increased traffic and connectivity

[edit]

Traffic grew during the third quarter from about 400-500 requests per second at the start to about 800 per second at the end. In the early fourth quarter that rose further to often exceeding 900 requests per second with daily peak traffic hours in the 1,000 to 1,100 requests per second range, then steadied at about 900 and slowly rose, due to the end of the back to school surge, slower than desired response times or both ([1]. Bandwidth use grew from averaging about 32 megabits per second at the start of the quarter to about 43 megabits per second at the end. Typical daily highs are about 65-75 megabits per second and sometimes briefly hit the 100 megabits per second limit of a single outgoing ethernet connection. Dual 100 megabit connections were temporarily used and a gigabit fiber connection has been arranged at the Florida colocation and the required parts ordered.