User talk:Halfak (WMF)/New page creations, deletions, and drafts
Add topicAppearance
Latest comment: 10 years ago by Halfak (WMF) in topic Tuesday, January 21st
Archive
[edit]2013
- Wednesday, November 13th - Page curation, deletion event, and AfC workflow
- Notes for digging up AFC history - Notes on AfC history & RFCs with links
- Thursday, November 14th - Page moves, log_page, and redirects
- Friday, November 15th - Creating revision stats
- Tuesday, November 19th - Filtering for newcomers and looking for relevant page moves
- Thursday, November 21th - Newcomers and account creation action. 2008 archive data bug detected.
- Friday, November 22nd - Flawed first look at article survival in enwiki
- Monday, November 25th - Gathering article survival from non-English Wikipedias
- Monday, December 2nd - Declining survival of articles in frwiki and ruwiki.
- Friday, December 6th - Work to fix flawed enwiki analysis with Articles for creation data
- Monday, December 9th - Fixed enwiki analysis shows declining article survival like other non-english wikis
- Tuesday, December 10th - Converting move log to original title and namespace
- Wednesday, December 11th - More page origins
- Thursday, December 12th - Dealing with drafts (includes diagram describing the problem)
- Friday, December 13th - Extracting publish date from draft history for direct comparison with non-drafts.
Work log
[edit]Tuesday, January 21st
[edit]Today I want to work out the data anomaly I have for dewiki. It looks like the number of newcomer created drafts falls off abruptly in the middle of 2011. So, I'd like to find some move events from the middle of that period (let's say, 2011/09-2011/10) to see what the revision comments look like and figure out if my regex was matching them properly or not.
> select * from logging where log_type = "move" and log_action = "move" and log_timestamp BETWEEN "201109" AND "201110" limit 2; +----------+----------+------------+----------------+----------+---------------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-------------+----------------+----------+ | log_id | log_type | log_action | log_timestamp | log_user | log_namespace | log_title | log_comment | log_params | log_deleted | log_user_text | log_page | +----------+----------+------------+----------------+----------+---------------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-------------+----------------+----------+ | 37317605 | move | move | 20110901001127 | 995197 | 0 | Joseph_II. | BKL Modell II | Joseph II. (Begriffsklärung) | 0 | MFleischhacker | 6430697 | | 37317717 | move | move | 20110901002353 | 708213 | 0 | Fundament | Eine Verschiebung wird erforderlich zur Aufspaltung der Artikelseite in eine allgemeine Begriffsklärungsseite und in eine Artikelseite über das Fundament im Bauwesen. | Fundament (Bauwesen) | 0 | A.Abdel-Rahim | 6430701 | +----------+----------+------------+----------------+----------+---------------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+-------------+----------------+----------+ 2 rows in set (0.31 sec)
OK. Time to look for the page.
> select * from page where page_title = "Joseph_II._(Begriffsklärung)" and page_namespace = 0; +---------+----------------+-------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+-----------------------+ | page_id | page_namespace | page_title | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random | page_touched | page_links_updated | page_latest | page_len | page_no_title_convert | +---------+----------------+-------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+-----------------------+ | 55428 | 0 | Joseph_II._(Begriffsklärung) | | 0 | 0 | 0 | 0.641909627543 | 20131024114813 | NULL | 121808972 | 413 | 0 | +---------+----------------+-------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+--------------------+-------------+----------+-----------------------+ 1 row in set (0.04 sec)
> select rev_comment from revision where rev_page = 55428 and rev_user = 995197; +-------------------------------------------------------------------------------------------+ | rev_comment | +-------------------------------------------------------------------------------------------+ | verschob „[[Joseph II.]]“ nach „[[Joseph II. (Begriffsklärung)]]“: BKL Modell II | +-------------------------------------------------------------------------------------------+ 1 row in set (0.13 sec)
OK. Now to check if my regex matches.
> select rev_comment RLIKE ".*(hat „|verschob die Seite )\\[\\[([^\]]+)\\]\\]“? nach „?\\[\\[([^\]]+)\\]\\]“?(.*)" from revision where rev_page = 55428 and rev_user = 995197; +-------------------------------------------------------------------------------------------------------------------+ | rev_comment RLIKE ".*(hat „|verschob die Seite )\\[\\[([^\]]+)\\]\\]“? nach „?\\[\\[([^\]]+)\\]\\]“?(.*)" | +-------------------------------------------------------------------------------------------------------------------+ | 0 | +-------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.04 sec)
It doesn't match. So now to figure out how to make the regex match. It should be pretty easy.
> select rev_comment RLIKE ".*(hat „|verschob „|verschob die Seite )\\[\\[([^\]]+)\\]\\]“? nach „?\\[\\[([^\]]+)\\]\\]“?(.*)" from revision where rev_page = 55428 and rev_user = 995197; +--------------------------------------------------------------------------------------------------------------------------------+ | rev_comment RLIKE ".*(hat „|verschob „|verschob die Seite )\\[\\[([^\]]+)\\]\\]“? nach „?\\[\\[([^\]]+)\\]\\]“?(.*)" | +--------------------------------------------------------------------------------------------------------------------------------+ | 1 | +--------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.03 sec)
There we go. Time to kick off the move detector again. With any luck, I can have new data by the end of the day. --Halfak (WMF) (talk) 17:52, 21 January 2014 (UTC)