Jump to content

Research:Rolling re-activated editor

From Meta, a Wikimedia project coordination wiki
Rolling re-activated editor
Specification
A is a user who completed less than edits between and and completed edits (but was not a R:newly registered user) between and .
WMF Standard
  • = 5 edits
  • = 30 days
Status
SQL
SET @n = 5; /* edits threshold */
SET @u = 30; /* activity unit in days */
SET @T = "20140102"; /* February 1st, 2014 before midnight */
 
/* Create a temp table of the current month's active editors who are not new */
CREATE TEMPORARY TABLE staging.current_active_non_new
SELECT
    rev_user
FROM (
    SELECT
        rev_user
    FROM (
        SELECT
            rev_user
        FROM revision
        WHERE rev_timestamp BETWEEN
            DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S") AND @T
        UNION ALL
        SELECT
            ar_user AS rev_user
        FROM archive
        WHERE ar_timestamp BETWEEN
            DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S") AND @T
    ) AS split_revisions
    GROUP BY rev_user
    HAVING COUNT(*) >= @n
) AS active_editors
LEFT JOIN logging nru ON
    log_type = "newusers" AND
    log_action = "create" AND
    log_user = rev_user
WHERE 
    nru.log_id IS NULL OR 
    nru.log_timestamp <= DATE_FORMAT(
        DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S");

/* Create a temp table of the last month's active editors */
CREATE TEMPORARY TABLE staging.last_active
SELECT
    rev_user
FROM (
    SELECT
        rev_user
    FROM revision
    WHERE rev_timestamp BETWEEN
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u*2 DAY), "%Y%m%d%H%i%S") AND
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S")
    UNION ALL
    SELECT
        ar_user AS rev_user
    FROM archive
    WHERE ar_timestamp BETWEEN
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u*2 DAY), "%Y%m%d%H%i%S") AND
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S")
) AS split_revisions
GROUP BY rev_user
HAVING COUNT(*) >= @n;

SELECT
    user_id,
    user_name,
    user_registration
FROM staging.current_active_non_new
INNER JOIN user ON rev_user = user_id
WHERE 
    rev_user NOT IN (SELECT rev_user FROM staging.last_active) AND
    rev_user NOT IN (SELECT ug_user FROM user_groups WHERE ug_group = "bot");

Rolling re-activated editors are non-bot editors who were not recently active in the previous time period, but are active in the current time period.

Discussion

[edit]

The n edits threshold

[edit]

The edits necessary to cross this threshold, the fewer editors will meet the criteria. Historically, "active editors" were considered to be editors who made 5 edits to specific types of pages over the course of a calendar month. So for historical purposes, we recommend .

The u activity unit

[edit]

The wider this unit is set, the more casual editors will be considered "active". Historically, "active editors" were considered during calendar months. So for historical purposes, we recommend .

Time lag

[edit]

This metric has a built-in time lag of for the time period in question.

Bot filtering

[edit]

Bots are filtered using the bot flag method.

Analysis

[edit]

Discussion

[edit]

Notes

[edit]