Research talk:Revision scoring as a service/Work log/2016-02-01
Add topicMonday, February 1, 2016
[edit]February! And time to look at the wikidata reverted that probably need review (because they might be vandalism).
> SELECT rev_id FROM wikidata_nonbot_reverted_sample WHERE NOT (trusted_edits OR trusted_user OR client_edit OR merge_edit) and reverted ORDER BY RAND() LIMIT 100
Here's the etherpad that I'll work from: https://etherpad.wikimedia.org/p/wikidata_reverted_edits_in_need_of_review Will post here when I'm done. --EpochFail (talk) 21:19, 1 February 2016 (UTC)
While I was working, I became curious about anonymous editors and how much more often they are reverted than registered under this new definition of "edits needing review".
> select NOT (trusted_edits OR trusted_user OR client_edit OR merge_edit) AS needs_review, anon_user, COUNT(*) AS edits, SUM(reverted) AS reverted, SUM(reverted)/COUNT(*) AS prop FROM wikidata_nonbot_reverted_sample GROUP BY needs_review, anon_user; +--------------+-----------+--------+----------+--------+ | needs_review | anon_user | edits | reverted | prop | +--------------+-----------+--------+----------+--------+ | 0 | 0 | 466054 | 1260 | 0.0027 | | 0 | 1 | 22 | 0 | 0.0000 | | 1 | 0 | 15546 | 123 | 0.0079 | | 1 | 1 | 6914 | 499 | 0.0722 | +--------------+-----------+--------+----------+--------+ 4 rows in set (0.33 sec)
So, regular edits by non-trusted registered editors seem to be reverted about 1/10th as often as anons. That's a pretty substantial gap. I wonder if we can attribute it entirely to vandalism or if registered user edits are just reviewed with less scrutiny. Let's find out. :) --EpochFail (talk) 21:24, 1 February 2016 (UTC)
Reverted edits needing review
[edit]- wikidata:Special:Diff/259717617 -- Good faith mistake -- Changes instance of to something that is kind of irrelevant.
- wikidata:Special:Diff/220988680 -- Good edit -- Removed by client edit
- wikidata:Special:Diff/255426246 -- Good faith mistake -- Replaces "United States" with "Peru" with some additional relevant information
- wikidata:Special:Diff/205308980 -- Vandalism -- "Triple H" to "El Nariz H"
- wikidata:Special:Diff/198475507 -- Vandalism -- "Looove :)"
- wikidata:Special:Diff/255849953 -- Good edit -- Removed by a client edit
- wikidata:Special:Diff/275346493 -- Vandalism
- wikidata:Special:Diff/195786020 -- Vandalism -- Key mash
- wikidata:Special:Diff/203310306 -- Vandalism
- wikidata:Special:Diff/194417582 -- Vandalism -- Key mash
- wikidata:Special:Diff/202664398 -- Good faith mistake -- English in Chinese field
- wikidata:Special:Diff/211775788 -- Good faith mistake -- Qid in label
- wikidata:Special:Diff/196174780 -- Vandalism
- wikidata:Special:Diff/215260923 -- Good edit -- Removed by a client edit
- wikidata:Special:Diff/193583064 -- Vandalism
- wikidata:Special:Diff/208386821 -- Good edit
- wikidata:Special:Diff/213258810 -- Vandalism
- wikidata:Special:Diff/257683414 -- Vandalism
- wikidata:Special:Diff/236726199 -- Good faith mistake -- Adds artist name to ArtistId
- wikidata:Special:Diff/275717066 -- Vandalism
- wikidata:Special:Diff/186068435 -- Good faith mistake -- Adds "urdu" as an urdu label
- wikidata:Special:Diff/245652313 -- Good edit -- Removed by a client edit
- wikidata:Special:Diff/220959942 -- Good faith mistake -- Adds eswiki label to english label
- wikidata:Special:Diff/202980037 -- Vandalism
- wikidata:Special:Diff/254836600 -- Good faith mistake -- Adds "given name" to disambig
- wikidata:Special:Diff/206252846 -- Vandalism
- wikidata:Special:Diff/211351591 -- Good faith mistake -- Removes coord from a road
- wikidata:Special:Diff/267997380 -- Vandalism
- wikidata:Special:Diff/204069043 -- Good edit -- Removed by a client edit
- wikidata:Special:Diff/211580917 -- Vandalism -- Changes date to something assinine
- wikidata:Special:Diff/203758801 -- Good edit -- Removed by a client edit
- wikidata:Special:Diff/277188356 -- Vandalism
- wikidata:Special:Diff/212561649 -- Vandalism
- wikidata:Special:Diff/186506808 -- Vandalism
- wikidata:Special:Diff/254938346 -- Good faith mistake -- Assuming good-faith
- wikidata:Special:Diff/237932699 -- Good edit -- Removed by a client edit
- wikidata:Special:Diff/186462721 -- Vandalism
- wikidata:Special:Diff/186215555 -- Good edit -- Removed without explanation
- wikidata:Special:Diff/258818736 -- Good faith mistake -- Assuming good-faith
- wikidata:Special:Diff/225575026 -- Project page edit
- wikidata:Special:Diff/270981183 -- Vandalism
- wikidata:Special:Diff/213145054 -- Vandalism
- wikidata:Special:Diff/224362748 -- Vandalism
- wikidata:Special:Diff/222863198 -- Vandalism
- wikidata:Special:Diff/204719016 -- Vandalism
- wikidata:Special:Diff/221558539 -- Vandalism
- wikidata:Special:Diff/209266557 -- Vandalism
- wikidata:Special:Diff/254100681 -- Project page
- wikidata:Special:Diff/280472039 -- Good edit
- wikidata:Special:Diff/206748413 -- Vandalism
- wikidata:Special:Diff/273913717 -- Vandalism
- wikidata:Special:Diff/187036426 -- Vandalism -- Blanking
- wikidata:Special:Diff/238723310 -- Vandalism -- Adds self
- wikidata:Special:Diff/187511826 -- Good edit -- Removed by a client edit
- wikidata:Special:Diff/201566582 -- Good faith mistake -- Assuming good-faith
- wikidata:Special:Diff/187814113 -- Vandalism
- wikidata:Special:Diff/200658374 -- Good faith mistake -- Assuming good-faith
- wikidata:Special:Diff/267205872 -- Vandalism
- wikidata:Special:Diff/189346966 -- Vandalism -- "AKA God"
- wikidata:Special:Diff/185736803 -- Vandalism
- wikidata:Special:Diff/237003795 -- Good faith mistake
- wikidata:Special:Diff/189171368 -- Vandalism
- wikidata:Special:Diff/204682619 -- Vandalism
- wikidata:Special:Diff/239747398 -- Vandalism
- wikidata:Special:Diff/243375161 -- Vandalism
- wikidata:Special:Diff/209457921 -- Vandalism
- wikidata:Special:Diff/225566304 -- Good edit -- Removed by a client edit
- wikidata:Special:Diff/228358914 -- Good edit -- Lilkely to be removed on the local wiki
- wikidata:Special:Diff/195076787 -- Vandalism
- wikidata:Special:Diff/205139908 -- Vandalism
- wikidata:Special:Diff/211535761 -- Vandalism
- wikidata:Special:Diff/212033842 -- Good edit -- Removed by a client edit
- wikidata:Special:Diff/282161994 -- Vandalism
- wikidata:Special:Diff/191593033 -- Good faith mistake -- adding description in wrong language
- wikidata:Special:Diff/213187505 -- Good faith mistake -- Not following label policies
- wikidata:Special:Diff/208429469 -- Good edit -- The site link got deleted later
- wikidata:Special:Diff/255158534 -- Good faith mistake
- wikidata:Special:Diff/247168926 -- Vandalism -- deleting completely ok statement
- wikidata:Special:Diff/186576506 -- Good faith mistake -- the user should've added this as alias but as label, judging by this https://uk.wikipedia.org/wiki/%D0%93%D1%80%D0%B5%D0%B3%D0%BE%D1%80%D1%96
- wikidata:Special:Diff/206015460 -- Vandalism -- messing with identifiers
- wikidata:Special:Diff/192426280 -- Vandalism -- messing with commons category
- wikidata:Special:Diff/264301907 -- Vandalism -- messing with P31
- wikidata:Special:Diff/208540606 -- Vandalism -- deleting some parts of en label
- wikidata:Special:Diff/222852121 -- Good edit -- deleted in client later
- wikidata:Special:Diff/253547015 -- Vandalism
- wikidata:Special:Diff/257589717 -- Vandalism -- messing with commons category
- wikidata:Special:Diff/210622326 -- Vandalism -- messign with place of birth, lol do we have a "big dick lake"
- wikidata:Special:Diff/274834655 -- Vandalism -- messing with identifiers
- wikidata:Special:Diff/260067812 -- Vandalism -- messing with commons category
- wikidata:Special:Diff/201954534 -- Vandalism
- wikidata:Special:Diff/203553132 -- Vandalism
- wikidata:Special:Diff/194339513 -- Vandalism -- adding language as label
- wikidata:Special:Diff/247365267 -- Vandalism
- wikidata:Special:Diff/268424755 -- Good edit -- added a link to pl.wp but that got deleted
- wikidata:Special:Diff/197385903 -- Vandalism
- wikidata:Special:Diff/195589006 -- Good faith mistake, adding nothing useful as description
- wikidata:Special:Diff/221187455 -- Vandalism -- changing label from battle of Nile to battle of banana
- wikidata:Special:Diff/283819984 -- Good faith mistake, Added disambiguation page of "John Dick" as mayor of the city who is "John Dickert"
- wikidata:Special:Diff/203697716 -- Vandalism -- Changing en label
- wikidata:Special:Diff/188511486 -- Vandalism -- adding improper content about Katy Perry
Vandalism | Good faith mistake | Good edit | Not mainspace |
---|---|---|---|
61 | 20 | 17 | 2 |
OK. 61% of this is vandalism and 81% was clearly damaging. If we filter out the edits that look like they were reverted because of site-link deletions (11), that increase the proportion to 61% for vandalism and 91% clearly damaging. --EpochFail (talk) 23:26, 1 February 2016 (UTC)
Non-reverted edits needing review
[edit]Finally. This is the last set that seems to need review. I want to look at a random sample of non-reverted edits that don't fit in the "don't need review" groups so that we can see how often vandalism and other types of damage is missed.
> SELECT rev_id FROM wikidata_nonbot_reverted_sample WHERE NOT (trusted_edits OR trusted_user OR client_edit OR merge_edit) AND NOT reverted ORDER BY RAND() LIMIT 100;
OK. here it is! https://etherpad.wikimedia.org/p/wikidata_non-reverted_edits_in_need_of_review --EpochFail (talk) 23:31, 1 February 2016 (UTC)
- wikidata:Special:Diff/210736379 -- Good edit
- wikidata:Special:Diff/282798211 -- Good edit
- wikidata:Special:Diff/236625099 -- Good edit
- wikidata:Special:Diff/220981753 -- Good edit
- wikidata:Special:Diff/209104150 -- Good edit
- wikidata:Special:Diff/236713855 -- Good edit
- wikidata:Special:Diff/283656763 -- Good edit
- wikidata:Special:Diff/206677672 -- Not in man ns
- wikidata:Special:Diff/286334152 -- Good edit
- wikidata:Special:Diff/267122898 -- Good edit
- wikidata:Special:Diff/230629457 -- Good edit
- wikidata:Special:Diff/186887985 -- Good edit
- wikidata:Special:Diff/257448168 -- Good edit
- wikidata:Special:Diff/254910443 -- Good edit
- wikidata:Special:Diff/188815232 -- Good edit
- wikidata:Special:Diff/199280581 -- Good edit
- wikidata:Special:Diff/221774437 -- Good edit
- wikidata:Special:Diff/201545085 -- Good edit
- wikidata:Special:Diff/238598737 -- Good edit
- wikidata:Special:Diff/285941911 -- Good edit
- wikidata:Special:Diff/284159442 -- Good edit
- wikidata:Special:Diff/186510126 -- Good edit -- Not great. Adds the containing geo-political region as a description
- wikidata:Special:Diff/269294848 -- Good edit
- wikidata:Special:Diff/191748169 -- Good edit
- wikidata:Special:Diff/198333170 -- Good edit
- wikidata:Special:Diff/280038271 -- Good edit
- wikidata:Special:Diff/253588302 -- Good edit
- wikidata:Special:Diff/221423745 -- Good edit
- wikidata:Special:Diff/260077677 -- Good edit
- wikidata:Special:Diff/207147539 -- Good edit
- wikidata:Special:Diff/225636394 -- Good edit
- wikidata:Special:Diff/219656446 -- Good edit
- wikidata:Special:Diff/214951486 -- Good edit
- wikidata:Special:Diff/204491174 -- Good edit
- wikidata:Special:Diff/249348628 -- Good faith mistake -- It's strange, editor is trusted but edit is definitely wrong: wikidata:User_talk:Seewolf#A_question
- wikidata:Special:Diff/210995820 -- Good edit
- wikidata:Special:Diff/262989441 -- Good edit
- wikidata:Special:Diff/238736157 -- Good faith mistake -- Changes links to disambiguation pages
- wikidata:Special:Diff/226275194 -- Good edit
- wikidata:Special:Diff/269757229 -- Good edit
- wikidata:Special:Diff/257038780 -- Good edit
- wikidata:Special:Diff/210920378 -- Good edit
- wikidata:Special:Diff/219520605 -- Good edit - Google translate
- wikidata:Special:Diff/228438954 -- Good edit
- wikidata:Special:Diff/244806504 -- Good edit
- wikidata:Special:Diff/238836236 -- Good edit
- wikidata:Special:Diff/189914996 -- Good edit
- wikidata:Special:Diff/254514416 -- Good edit
- wikidata:Special:Diff/254301964 -- Good edit
- wikidata:Special:Diff/207791339 -- Good edit
- wikidata:Special:Diff/236115304 -- Good edit
- wikidata:Special:Diff/197154763 -- Good edit
- wikidata:Special:Diff/239214904 -- Good edit
- wikidata:Special:Diff/211562501 -- Good edit
- wikidata:Special:Diff/269656801 -- Good edit
- wikidata:Special:Diff/236207716 -- Good edit
- wikidata:Special:Diff/272311946 -- Good edit
- wikidata:Special:Diff/222716787 -- Good edit
- wikidata:Special:Diff/239324191 -- Good edit
- wikidata:Special:Diff/270699076 -- Good edit
- wikidata:Special:Diff/277573455 -- Good edit - Using google translate
- wikidata:Special:Diff/225900863 -- Good edit
- wikidata:Special:Diff/260058207 -- Good edit
- wikidata:Special:Diff/224384338 -- Good edit
- wikidata:Special:Diff/222599502 -- Good edit
- wikidata:Special:Diff/255194242 -- Good edit
- wikidata:Special:Diff/239933902 -- Good edit
- wikidata:Special:Diff/212782409 -- Good edit
- wikidata:Special:Diff/213531541 -- Good edit
- wikidata:Special:Diff/272168487 -- Good edit
- wikidata:Special:Diff/189304829 -- Good edit
- wikidata:Special:Diff/254211716 -- Good edit
- wikidata:Special:Diff/252249797 -- Good edit
- wikidata:Special:Diff/222409831 -- Good edit
- wikidata:Special:Diff/200697766 -- Vandalism
- wikidata:Special:Diff/284419999 -- Good edit
- wikidata:Special:Diff/220126340 -- Good faith mistake -- should be done another way
- wikidata:Special:Diff/219958626 -- Good edit
- wikidata:Special:Diff/204880305 -- Good edit
- wikidata:Special:Diff/240199178 -- Good edit
- wikidata:Special:Diff/251957534 -- Good edit
- wikidata:Special:Diff/195250415 -- Good faith mistake -- Can't seem to figure out what they are doing
- wikidata:Special:Diff/276802714 -- Good edit
- wikidata:Special:Diff/275178359 -- Good edit
- wikidata:Special:Diff/258042745 -- Good edit
- wikidata:Special:Diff/283899578 -- Good edit
- wikidata:Special:Diff/247817552 -- Good edit -- fixing mistake
- wikidata:Special:Diff/279037439 -- Good edit
- wikidata:Special:Diff/215195849 -- Good edit -- Client edit changes link page that was moved to user sandbox rather than deleted.
- wikidata:Special:Diff/224762201 -- Good edit
- wikidata:Special:Diff/223229480 -- Good edit
- wikidata:Special:Diff/191275933 -- Good edit -- Fixing Vandalism
- wikidata:Special:Diff/252101397 -- Good edit -- but Need some work
- wikidata:Special:Diff/264484803 -- Good edit
- wikidata:Special:Diff/239145015 -- Good edit
- wikidata:Special:Diff/262829723 -- Good edit
- wikidata:Special:Diff/208837421 -- Not in main ns
- wikidata:Special:Diff/197997987 -- Good edit -- Using google translate
- wikidata:Special:Diff/258281806 -- Good edit
- wikidata:Special:Diff/188834034 -- Good edit
Vandalism | Good faith mistake | Good edit |
---|---|---|
1 | 3 | 94 |
OK. In this set, we get 1/98 = 1.0% vandalism because it took a long time and wasn't a revert or a rollback. We get 3/98=3.1% good-faith mistakes -- one of which we are looking into because it is hard to figure out what is going on. And the rest is good. That's 94/98=96%. --EpochFail (talk) 03:35, 2 February 2016 (UTC)