Research talk:Expanding Wikipedia articles across languages/Data
Add topicAppearance
Latest comment: 6 years ago by Tizianopiccardi
@Tizianopiccardi: I copied your comment from the email below and left a response for it.
" I discussed with Baha the high-level overview of the current implementation, and I'll send him the dataset for English version by tomorrow.
For the English dataset, should I send him a version generated from the complete dataset? Or we need for some reason to keep away a portion to use as a testing set? "
@Tizianopiccardi: @Bmansurov (WMF): can you expand what will be the items in the dataset? That way we can figure out if some data will need to be kept aside or not. --LZia (WMF) (talk) 21:45, 8 January 2018 (UTC)
- The data will be similar to the French data downloadable here, something like this:
{"category":"Catégorie:Ville_de_Souss-Massa-Drâa","recs":[{"relevance":0.3333333333333333,"title":"Notes et références"},{"relevance":0.3333333333333333,"title":"Voir aussi"},{"relevance":0.2222222222222222,"title":"Démographie"},{"relevance":0.2222222222222222,"title":"Économie"},{"relevance":0.1111111111111111,"title":"Infrastructures"},{"relevance":0.1111111111111111,"title":"Culture"},{"relevance":0.1111111111111111,"title":"Population"},{"relevance":0.1111111111111111,"title":"Manifestations"},{"relevance":0.1111111111111111,"title":"Vue d'ensemble"},{"relevance":0.1111111111111111,"title":"Climat"}]}
Bmansurov (WMF) (talk) 21:59, 8 January 2018 (UTC)
- @Bmansurov (WMF): I updated the /Data page with the English version Tizianopiccardi (talk) 19:37, 9 January 2018 (UTC)
- @Tizianopiccardi: Thanks! Does the data contain everything, or did you keep some of it for testing? Bmansurov (WMF) (talk) 19:47, 9 January 2018 (UTC)
- @Bmansurov (WMF): Everything in both cases (FR | EN). I assume that if we need to run some evaluation, it's better to run and experiment with humans. The authomatic evaluation is not very significative in this case. Tizianopiccardi (talk) 14:16, 10 January 2018 (UTC)
- @Tizianopiccardi: Thanks! Does the data contain everything, or did you keep some of it for testing? Bmansurov (WMF) (talk) 19:47, 9 January 2018 (UTC)