Jump to content

Toolserver/Database

From Meta, a Wikimedia project coordination wiki

Connecting from a program

[edit]
See tswiki:Database access#Program access.

Unfortunately, PHP doesn't support this method of connecting. If you use PHP, see /PHP for an alternative method.

GUI database access tool (PHPMyAdmin)

[edit]
See tswiki:Database access#phpMyAdmin.

Database schema and data format

[edit]
See tswiki:Database access#Database schema.

Wiki page text

[edit]
Note: The suggested way to access page text is via WikiProxy. The rest of this section only applies if you're using raw database access.

Direct access to page content is only available up to 2005-07-22. It is located in the 'cur' or 'text' tables and may be compressed or in object format.

  • Compressed text has 'gzip' in old_flags. This data is compressed with headerless zlib compression.
  • Text with "utf8" in old_flags is in UTF-8 encoding. Text without this flag may or may not be latin-1. (??? explain further)
  • Text with "object" in old_flags is encoded as a serialised PHP object. This may refer to:
    • cur stubs. the PHP object contains the cur_id for the relevant row in the cur table containing the actual text.
    • History blob stubs. this is concatenated compressed storage. (??? is this documented?)
    • Something else?
  • Text with "external" in old_flags is stored in separate external databases. This text is not yet available on the toolserver databases.

You can use the MediaWiki function Revision::getRevisionText to extract the actual text automatically (see Revision.php for details).

At some point, the missing old text will be imported from an XML dump, and compressed data will be uncompressed. However, this needs to wait until we have more disk space available.

Character encoding

[edit]

Note the following peculiarity about character encoding. While the toolserver database is running MySQL 5.0, the wiki data is still encoded in the way that was customary in MySQL 4.0: The tables are marked as using latin-1 encoding, but they actually contain UTF-8 encoded data. Current database drivers may need specific configuration for backward compatibility with this encoding scheme. Details depend on the driver used.

Connector/J

For Connector/J 5.1, the parameter "useOldUTF8Behavior=true" needs to be specified in order to activate the backward compatible method, along with other parameters that specify the encoding used. The following is an example of a properly configured JDBC URL:

jdbc:mysql://enwiki-p.db.toolserver.org/enwiki_p?useOldUTF8Behavior=true&useUnicode=true&characterEncoding=UTF-8&connectionCollation=utf8_general_ci

Notes/Warnings

[edit]
  • Timestamps are not numbers, they are strings consisting of numeric characters. Make sure to enclose timestamps in quotes (for example, rc_timestamp > '200500000000'), or your your query will run much slower (up to 50 or more times slower).
  • Zombie queries: If you accidently started a query that needs too much time you can get its thread_id with SHOW PROCESSLIST and terminate it with KILL thread_id. Typing ^C at the MySQL client prompt will also kill the currently executing query.
  • Bulk insert is much faster using tab-separated files than inserting single records: LOAD DATA LOCAL INFILE absolute-path-to-file IGNORE INTO TABLE table.
  • If you run slow queries, you may want to read [1] [2] for a way to make them faster, more detail at [3].

User databases

[edit]

You have access to your own user database called u_<username>. This is located on the sql alias ("sql" is a host name visible locally). The database is backed up nightly.

You also have access to create any database named u_<username>_<anything>. If <anything> ends in _p, the database will be accessible (read-only) to all users. This can be used to share data with other users. Example:

$ mysql -h sql 
mysql> create database u_myownname_someschema_p;

Because the sql alias is not a dedicated server, your username database is on the same server as one or more clusters. For example, sql and sql-s1 might be the same server. (But this is subject to change at any time.)

On the other servers (which are not sql), you also have access to create u_<username> and u_<username>_<anything>. However, these are not backed up. They are intended to be used for storing temporary data while using the wiki databases.

Status files

[edit]
See tswiki:Database access#Database status files.