Jump to content

Template A/B testing/Test Edits Analyses

From Meta, a Wikimedia project coordination wiki

Overview

[edit]

This analysis involves tracking the behaviour of editors warned by the 28 bot and Rscprinter Bot:



Analyses Results

[edit]

28 bot Registered (templates 145, 146)

[edit]

The 28 bot templates for registered users did not show any significant result for blocks or editing (all namespaces) after the posting:


Logistic Regression Analysis on edits events after the posting - R Output

Call:
glm(formula = template ~ metric, family = binomial(link = "logit"), 
    data = temp_df)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.337  -1.318   1.026   1.043   1.043  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.3254     0.3640   0.894    0.371
metric        0.0423     0.5661   0.075    0.940

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 71.938  on 52  degrees of freedom
Residual deviance: 71.932  on 51  degrees of freedom
AIC: 75.932

Number of Fisher Scoring iterations: 4

[1] "Summary of metric for test:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.4194  1.0000  1.0000 

[1] "Summary of metric for control:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.4091  1.0000  1.0000 


Logistic Regression Analysis on blocks after the posting - R Output

Call:
glm(formula = template ~ metric, family = binomial(link = "logit"), 
    data = temp_df)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.326  -1.326   1.036   1.036   1.036  

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.3429     0.2788    1.23    0.219
metric            NA         NA      NA       NA

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 71.938  on 52  degrees of freedom
Residual deviance: 71.938  on 52  degrees of freedom
AIC: 73.938

Number of Fisher Scoring iterations: 4

[1] "Summary of metric for test:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 
[1] "Summary of metric for control:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 


28 bot Non-registered (templates 145, 146)

[edit]

For non-registered users there was a strong effect however the selection of the groups is biased. For blocks and warnings before the postings the test template had a rate of 0.0% and 0.6% while the control group had a rate of 30.4% and 92.4% respectively.


Logistic Regression Analysis on edits events after the posting - R Output

Call:
glm(formula = template ~ metric, family = binomial(link = "logit"), 
    data = temp_df)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9111   0.5926   0.5926   0.5926   1.2609  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.6505     0.1387  11.904  < 2e-16 ***
metric       -1.8447     0.3866  -4.771 1.83e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 404.42  on 415  degrees of freedom
Residual deviance: 382.55  on 414  degrees of freedom
AIC: 386.55

Number of Fisher Scoring iterations: 3

[1] "Summary of metric for test:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.00000 0.00000 0.04154 0.00000 1.00000 
[1] "Summary of metric for control:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.2152  0.0000  1.0000 


Logistic Regression Analysis on blocks after the posting - R Output

Call:
glm(formula = template ~ metric, family = binomial(link = "logit"), 
    data = temp_df)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.850   0.631   0.631   0.631   1.893  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.5130     0.1284  11.783  < 2e-16 ***
metric       -3.1225     1.1029  -2.831  0.00464 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 404.42  on 415  degrees of freedom
Residual deviance: 392.56  on 414  degrees of freedom
AIC: 396.56

Number of Fisher Scoring iterations: 4

[1] "Summary of metric for test:"
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.000000 0.000000 0.000000 0.002967 0.000000 1.000000 

[1] "Summary of metric for control:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.00000 0.00000 0.06329 0.00000 1.00000 


28 bot vs Rcsprinter bot

[edit]

The Rcsprinter bot typically reverts edits that are classified as "Reverting vandalism and test edits", and more importantly does not leave template warnings on the talk pages of reverted users. This bot was compared to 28 bot.

In order to get data for the RscpritnerBot the postings it was necessary to create a way to modify revision data based on experimental contraints - in this case to observe main namespace revisions and to extract the recipient of the revert from the revision comment. The details of this modification can be found here.

The final analysis showed that among registered users that received template warnings from 28bot significantly outperformed those reverted by Rscprinterbot when observing those who made at least one edit in any namespace after the revert (and posting). The blocks after among the two groups did not differ significantly. Non-registered users that had not received previous warnings were also observed however, there was no significant effect for edits or blocking after the posting in this case.


Logistic Regression Analysis, Registered Users, Edit Events - R Output

Call:
glm(formula = template ~ metric, family = binomial(link = "logit"), 
    data = temp_df)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2557  -1.2557  -0.5553   1.1010   1.9728  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)   0.1823     0.2708   0.673  0.50078   
metric       -1.9741     0.6799  -2.904  0.00369 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 104.039  on 75  degrees of freedom
Residual deviance:  93.016  on 74  degrees of freedom
AIC: 97.016

Number of Fisher Scoring iterations: 4

[1] "Summary of metric for Rscprinterbot:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.00000 0.00000 0.09091 0.00000 1.00000 

[1] "Summary of metric for 28bot:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.4186  1.0000  1.0000 


Logistic Regression Analysis, Non-Registered Users, Edit Events - R Output
Call:
glm(formula = template ~ metric, family = binomial(link = "logit"), 
    data = temp_df)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.9005  -0.7858  -0.7858   1.6283   1.6283  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.0169     0.1070  -9.507   <2e-16 ***
metric        0.3238     0.4751   0.682    0.496    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 545.84  on 468  degrees of freedom
Residual deviance: 545.39  on 467  degrees of freedom
AIC: 549.39

Number of Fisher Scoring iterations: 4

[1] "Summary of metric for rscprinterbot:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.00000 0.00000 0.05556 0.00000 1.00000 

[1] "Summary of metric for 28bot:"
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.00000 0.00000 0.04082 0.00000 1.00000