Research talk:Automated classification of article quality/Work log/2016-04-08
Add topicAppearance
Latest comment: 8 years ago by EpochFail in topic Friday, April 8, 2016
Friday, April 8, 2016
[edit]Quick pasting some notes on the last run:
$ cat enwiki.observations.first_labelings.20160204.json | grep '"stub"' | wc 3005521 28984420 337131600 $ cat enwiki.observations.first_labelings.20160204.json | grep '"start"' | wc 1398595 13858836 159669311 $ cat enwiki.observations.first_labelings.20160204.json | grep '"c"' | wc 211116 2086434 23159257 $ cat enwiki.observations.first_labelings.20160204.json | grep '"b"' | wc 134194 1332090 14731302 $ cat enwiki.observations.first_labelings.20160204.json | grep '"ga"' | wc 29417 295572 3260669 $ cat enwiki.observations.first_labelings.20160204.json | grep '"fa"' | wc 6696 68043 747531 $ cat enwiki.observations.first_labelings.20160204.json | grep '"a"' | wc 4661 46356 512263
--Halfak (WMF) (talk) 19:28, 8 April 2016 (UTC)
Just got a chance to actually build the model with this data. It doesn't look good.
ScikitLearnClassifier - type: RF - params: warm_start=false, max_features="auto", random_state=null, verbose=0, bootstrap=true, n_estimators=501, min_samples_leaf=8, oob_score=false, balanced_sample=true, max_depth=null, center=true, min_samples_split=2, scale=true, criterion="gini", max_leaf_nodes=null, class_weight=null, n_jobs=1, min_weight_fraction_leaf=0.0, balanced_sample_weight=false - version: 0.3.1 - trained: 2016-04-13T00:13:15.203516 Table: ~b ~c ~fa ~ga ~start ~stub ----- ---- ---- ----- ----- -------- ------- b 328 246 102 171 133 17 c 151 504 25 142 179 17 fa 70 27 689 186 17 17 ga 68 92 257 535 24 7 start 86 147 5 23 548 133 stub 6 12 1 3 151 804 Accuracy: 0.575 ROC-AUC: ------- ----- 'b' 0.782 'c' 0.843 'fa' 0.912 'ga' 0.864 'start' 0.873 'stub' 0.971 ------- ----- F1: ----- ----- b 0.385 start 0.55 c 0.493 ga 0.524 stub 0.815 fa 0.661 ----- -----
This is still low accuracy. I think that we should try full-on trying to change to Nettrom's strategy of only accepting a only the assessment classes that appear on the most recent version of the talk page. So, it'll take some hacking in order to do the next run. --EpochFail (talk) 14:07, 13 April 2016 (UTC)