Research talk:Automated classification of draft quality/Work log/2016-12-01

Thursday, December 1, 2016

Latest comment: 8 years ago2 comments1 person in discussion

Today, I'm analyzing how good my PCFG models are at differentiating sentences from FA, spam, vandalism and attack articles.

Essentially, I trained the four models and then plotted the log_proba / productions for each of the input sets.

Prediction density. The log probability from 4 PCFGs trained on sentences from FA, Spam, Vandalism, and Attack articles is presented for the input sentences. (English Wikipedia)

OK so this plot shows that the PCFG can totally differentiate to a minor extent, but it there's certainly a lot of overlap between the different models. --EpochFail (talk) 19:57, 1 December 2016 (UTC)Reply

So in thinking about how there's probably a clear difference between the scores of the various models, I decided to try something different. In the following plot, I subtract the log_proba of the given model from that of the model applied to it's own content. So essentially, this plot shows us how well *the other models* differentiate from the own model. We want to see some negative values and little overlap on or above zero.

Prediction density diff. The difference in log probability between PCFGs is plotted. The log_proba of the model in question is subtracted from the log_proba of the model for the content. A zero value represents the inability to differentiate from the "own model". A negative value represents a clear differentiation.

This looks a lot more promising. It looks like we can differentiate FA and attacks pretty well. We can differentiate spam pretty well too, but it's interesting to see that spam looks a lot like FA content. Vandalism is weird. A large part of the attack model fits vandalism better than the vandalism model does. That shouldn't be possible. --EpochFail (talk) 22:33, 1 December 2016 (UTC)Reply