Jump to content

WSoR datasets/categorylinks

From Meta, a Wikimedia project coordination wiki

This table contains all of the categorylinks related to WikiProjects (e.g. matching the patter ^WikiProject_%). It is used to generated other WikiProject tables without having to do a number of sub selects.

Location

[edit]

db42:shawn.categorylinks

Fields

[edit]
shawn@internproxy:~$ mysql -h db42 -e "EXPLAIN categorylinks;SELECT * FROM categorylinks LIMIT 3" shawn
+-------------------+------------------------------+------+-----+-------------------+-------+
| Field             | Type                         | Null | Key | Default           | Extra |
+-------------------+------------------------------+------+-----+-------------------+-------+
| cl_from           | int(8) unsigned              | NO   | PRI | 0                 |       |
| cl_to             | varbinary(255)               | NO   | PRI |                   |       |
| cl_sortkey        | varbinary(230)               | NO   | MUL |                   |       |
| cl_timestamp      | timestamp                    | NO   |     | CURRENT_TIMESTAMP |       |
| cl_sortkey_prefix | varbinary(255)               | NO   |     |                   |       |
| cl_collation      | varbinary(32)                | NO   | MUL |                   |       |
| cl_type           | enum('page','subcat','file') | NO   |     | page              |       |
| week              | int(11)                      | YES  | MUL | NULL              |       |
| year              | int(4)                       | YES  | MUL | NULL              |       |
| week_start        | timestamp                    | YES  | MUL | NULL              |       |
+-------------------+------------------------------+------+-----+-------------------+-------+
+---------+-------------------------------------+-----------------+---------------------+-------------------+--------------+---------+------+------+---------------------+
| cl_from | cl_to                               | cl_sortkey      | cl_timestamp        | cl_sortkey_prefix | cl_collation | cl_type | week | year | week_start          |
+---------+-------------------------------------+-----------------+---------------------+-------------------+--------------+---------+------+------+---------------------+
|     354 | WikiProject_Algeria_articles        | ALGERIA         | 2011-06-18 10:52:39 |                   | uppercase    | page    |   24 | 2011 | 2011-06-13 00:00:00 |
|     354 | WikiProject_Countries_articles      | ALGERIA         | 2011-06-18 10:52:39 |                   | uppercase    | page    |   24 | 2011 | 2011-06-13 00:00:00 |
|     672 | WikiProject_Historic_sites_articles | ARC DE TRIOMPHE | 2009-05-30 13:24:33 |                   | uppercase    | page    |   21 | 2009 | 2009-05-25 00:00:00 |
+---------+-------------------------------------+-----------------+---------------------+-------------------+--------------+---------+------+------+---------------------+

Each row represents a a categorylink that starts with 'WikiProject_'. The table uses the the same structure as MediaWiki's [[1]].

cl_from
Stores the page_id of the article where the link was placed.
cl_to
Stores the name (excluding namespace prefix) of the desired category. Spaces are replaced by underscores (_)
cl_sortkey
Stores the title by which the page should be sorted in a category list.
cl_timestamp
Stores the time at which that link was last updated in the table.

There are three indexes which help improve performance:

  • The concatenation of cl_from and cl_to (for when an article is edited)
  • The concatenation of cl_to and the first 128 bytes of cl_sortkey (for displaying articles in order)
  • The concatenation of cl_to and cl_timestamp

Reproduction

[edit]

CREATE TABLE `categorylinks` (

 `cl_from` int(8) unsigned NOT NULL DEFAULT '0',
 `cl_to` varbinary(255) NOT NULL DEFAULT ,
 `cl_sortkey` varbinary(230) NOT NULL DEFAULT ,
 `cl_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 `cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT ,
 `cl_collation` varbinary(32) NOT NULL DEFAULT ,
 `cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
 `week` int(11) DEFAULT NULL,
 `year` int(4) DEFAULT NULL,
 `week_start` timestamp NULL DEFAULT NULL,
 UNIQUE KEY `cl_from` (`cl_from`,`cl_to`),
 KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
 KEY `cl_collation` (`cl_collation`),
 KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
 KEY `categorylinks_clsortkey_idx` (`cl_sortkey`),
 KEY `week_indx` (`week`),
 KEY `year_indx` (`year`),
 KEY `wk_start_indx` (`week_start`)

);

INSERT INTO categorylinks SELECT * FROM enwiki.categorylinks WHERE cl_to LIKE 'WikiProject_%'

Dependencies

[edit]

This table is generated from enwiki.categorylinks.

Notes

[edit]

This table was generated on 2011-07-08 07:39:23 from enwiki.