Jump to content

Research talk:Revision scoring as a service/Work log/2015-07-22

Add topic
From Meta, a Wikimedia project coordination wiki

Wednesday, July 22, 2015

[edit]

White Cat asked me to check how many task lacked any labels for our ongoing Wiki labels campaigns.

wikilabels=> SELECT campaign.id, wiki, COUNT(*) FROM campaign INNER JOIN task ON campaign_id = campaign.id LEFT JOIN label ON task_id = task.id WHERE task_id IS NULL GROUP BY campaign.id, wiki;
 id |  wiki  | count 
----+--------+-------
  4 | enwiki |   602
  9 | frwiki | 19949
  6 | fawiki |   261
  3 | ptwiki |     4
  5 | trwiki |  1546
  8 | azwiki | 20000
  7 | ptwiki |  1058
(7 rows)

It looks like we have a lot of duplicate labels for enwiki due to running the auto-labeling after people got started with labeling.

Let's check how much energy we wasted (also autolabeling we'll be able to check).

wikilabels=> SELECT wiki, SUM(CAST(labels > 1 AS INT)) FROM (SELECT wiki, task_id, COUNT(label.*) AS labels FROM campaign INNER JOIN task ON campaign_id = campaign.id LEFT JOIN label ON task_id = task.id WHERE task_id IS NOT NULL GROUP BY wiki, task_id) AS foo GROUP BY wiki;
  wiki  | sum 
--------+-----
 enwiki | 702
 fawiki | 187
 frwiki |   0
 ptwiki | 285
 trwiki |  57
(5 rows)

So in enwiki, we got 702 human labels that we didn't need to finish the campaign due to autolabeling. --EpochFail (talk) 14:52, 22 July 2015 (UTC)Reply