Talk:Mix'n'match/Archive 1
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Before using
You have sent messages about Mix'n'match over 172 Wikipedias but practically without any explanations. I have some questions but first of all I would to ask whether this tool will be maintained and developed for a sufficient time period. Otherwise it is not worthwhile to bother. — Ace111 (talk) 18:35, 10 October 2014 (UTC)
- I've not added many explanations because those can be added here (and translated, if necessary). There's also a blog post, linked from the page. Your question is interesting because I would never have thought of writing something about that, no matter how long a message I had written. :-)
- Can you explain better what makes you think a tool is not worthwhile using if it's not maintained for long? We have hundreds of tools on Toolserver and Labs, there are no promises on their maintenance or they wouldn't be tools. If you specify what you want maintenance for, then perhaps Magnus can tell you whether he can promise that; or we can look for more maintainers to be able to reach that level.
- If I were a user worried about the tool disappearing tomorrow, I'd just use it for the pretty statistics (fr.wiki more complete than de.wiki? how many women scientists in my wiki? how can it.wiki lack hundreds of articles on medieval authors etc.?) and, as the blog post says, for the "lists of red links on steroids". If you create an article following a red link, you usually don't ask about preservation of the red link, only of the article. :) --Nemo 21:24, 10 October 2014 (UTC)
Updating Wikidata from catalogs
Once we've added a Q number to a catalog item, does a bot follow along and add the catalog statement to the Wikidata item, or do we need to do that by hand? - PKM (talk) 21:02, 12 November 2014 (UTC)
- Not all the catalogs even have a property on Wikidata, so the answer can't certainly be "yes" in general. I have no idea about the catalogs which do have a Wikidata property for their identifiers. --Nemo 21:13, 12 November 2014 (UTC)
- Okay, thanks. - PKM (talk) 01:09, 15 November 2014 (UTC)
- Such props are "instance of" Wikidata property for authority control, eg see ULAN identifier. Most of them also have "subject item" pointing to the respective catalog, eg Union List of Artist Names. But it's either not possile to query for properties in WDQ, or my WDQish is not good yet.
- Here's the list that Reasonator uses.
- Yes, it's possible to update Wikidata from Mix-n-match: press "Y" next to the catalog name. Eg for ODNB the link is: https://tools.wmflabs.org/mix-n-match/?mode=sync&catalog=1 --Vladimir Alexiev (talk) 12:57, 14 January 2015 (UTC)
- Okay, thanks. - PKM (talk) 01:09, 15 November 2014 (UTC)
Now Mix-n-match adds the coreferenced id as claim immediately. --Vladimir Alexiev (talk) 18:50, 13 February 2015 (UTC)
British Museum person-institution thesaurus
@Magnus Manske: The British Museum person-institution thesaurus has 176461 entries that are not coreferenced to anything in the world. I think they'd see it as a major win if the community helps them to coreference.
This could be followed by importing the 2.5M cultural objects of the BM.
- table of all BM thesauri: https://confluence.ontotext.com/display/ResearchSpace/Meta-Thesaurus+and+FR+Names (but also includes entries for partial Yale, ULAN, RKD: don't be fooled)
- download: bmPerson-institution-better.tsv.gz
- fields: url name otherNames type gender note profession nationality birth death
- fetched from 5 queries here [1], their results assembled with unix join (person-institution.sh). Note: the endpoint used is temporary and won't be there long
Cheers! --Vladimir Alexiev (talk) 15:40, 25 January 2015 (UTC)
- Use the data above, not the query below***
- SPARQL query returns url name otherNames type gender profession nationality birthDate deathDate note
prefix ecrm: <http://erlangen-crm.org/current/> select ?x ?name ?otherNames ?type ?gender ?profession ?nationality ?birth ?death ?note { ?x skos:inScheme id:person-institution. {select ?x ?name (group_concat(?other; separator="; ") as ?otherNames) {?x skos:inScheme id:person-institution; skos:prefLabel ?name. optional {?x ecrm:P131_is_identified_by/rdfs:label ?other filter(?other != ?name)}} group by ?x ?name} bind(if(exists{?x a ecrm:E21_Person},"Person","") as ?type) optional {?x bmo:PX_gender ?gender1. bind(strafter(str(?gender1),"gender/") as ?gender2) bind(if(?gender2="m","male",?gender2) as ?gender)} {select ?x (group_concat(?prof1; separator="; ") as ?profession) {?x skos:inScheme id:person-institution; bmo:PX_profession [rdfs:label ?prof1]} group by ?x} {select ?x (group_concat(?nation1; separator="; ") as ?nationality) {?x skos:inScheme id:person-institution; bmo:PX_nationality [skos:prefLabel ?nation1]} group by ?x} {select ?x (group_concat(?birth1; separator="; ") as ?birth) {?x skos:inScheme id:person-institution; ecrm:P92i_was_brought_into_existence_by [ecrm:P4_has_time-span [rdfs:label ?birth1]]} group by ?x} {select ?x (group_concat(?death1; separator="; ") as ?death) {?x skos:inScheme id:person-institution; ecrm:P93i_was_taken_out_of_existence_by [ecrm:P4_has_time-span [rdfs:label ?death1]]} group by ?x} optional {?x ecrm:P3_has_note ?note} }
- WARNING: This query causes a stack overflow on http://collection.britishmuseum.org/sparql after 8423 records, so DON'T run it there.
- The result is JSON (a little ugly):
{ "nationality": { "type": "literal", "value": "" }, "name": { "type": "literal", "value": "Francisco Pizarro" }, "profession": { "type": "literal", "value": "" }, "gender": { "type": "literal", "value": "male" }, "type": { "type": "literal", "value": "Person" }, "deathDate": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#date", "value": "1541-01-01" }, "note": { "type": "literal", "value": "Conquistador; born Trujillo, Castile, conqueror of Peru and founder of Lima (1533). \n\nHe is subject of ..." }, "otherNames": { "type": "literal", "value": "Pizarro, Francisco" }, "x": { "type": "uri", "value": "http:\/\/collection.britishmuseum.org\/id\/person-institution\/113757" } },
Cheers! --Vladimir Alexiev (talk) 18:09, 16 January 2015 (UTC)
A CVS option
The nice folks at http://finds.org.uk have provided CSV dumps of the thesauri at https://github.com/findsorguk/bmThesauri.
- bmPerson-InstitutionThesauri.csv provides fields URI,Label,ScopeNote,Gender,Nationality while the above provides more, in particular otherNames and years.
- Nationality is a multivalue field, which is not handled properly. We can find instances with multiple values on the SPARQL endpoint:
select * {?x bmo:PX_nationality ?n1,?n2 filter(str(?n1)<str(?n2))}
- Eg http://collection.britishmuseum.org/id/person-institution/142680 is both Afghan, German, and Indian
- So we find 3 rows for him in the CSV:
grep 142680 bmPerson-InstitutionThesauri.csv
- I think they used these queries: https://github.com/findsorguk/BritishMuseumSparql/blob/master/person-institution.txt. So I posted https://github.com/findsorguk/BritishMuseumSparql/pull/1 with my query
- The CSV has 179813 rows and there are about 3429 "duplicates" with multiple nationalities, so 176384 persons. Pretty close to the total of 176461.
- @Vladimir Alexiev: I have imported the CSV set into Mix'n'match as "BMT" (British Museum Thesaurus). Dumb automatching finds ~15% of entries. Does this have a Wikidata property? If not, should it? --Magnus Manske (talk) 16:15, 20 January 2015 (UTC)
- Ah, I saw (and supported) your property proposal! :-) --Magnus Manske (talk) 16:32, 20 January 2015 (UTC)
- @Magnus Manske: So quick, great! But looking in MnM, many entries have just a name, so it's a poor basis for making decisions. Don't you need also at least the years and otherNames? --Vladimir Alexiev (talk) 15:47, 21 January 2015 (UTC)
- Well, I imported all of the CSV, and the JSON is invalid format, so... --Magnus Manske (talk) 10:53, 22 January 2015 (UTC)
Twitting and Blogging
"#wikidata has the power to make large-scale #coreferencing between Authority Files work"
Tweeted about this affair https://twitter.com/search?q=%23coreferencing, please retweet:
- VIAF investigations: https://twitter.com/valexiev1/status/557928238673297408
- ULAN investigations: https://twitter.com/valexiev1/status/557928905706061824
- BMT MnM image: https://twitter.com/valexiev1/status/557925594114306048
- ULAN MnM image: https://twitter.com/valexiev1/status/557924523358846976
- GND prop metadata image: https://twitter.com/valexiev1/status/557924071154151425
- Authority Control image: https://twitter.com/valexiev1/status/557923638490693633
Filter out Disambiguation entries and Un-notable Persons
I wrote about this to Jane: https://commons.wikimedia.org/wiki/User_talk:Jane023#Don.27t_Coreference_Disambiguation_pages
Posted as https://bitbucket.org/magnusmanske/mixnmatch/issue/3
@Magnus Manske: We should match persons to persons, not disambiguation pages to persons or to disambiguation pages.
- RKD artists has disambiguation pages: both RKD artists:Il Bambaia and RKD artists:Bambaia say "See: Busti, Agostino". Don't know if you can recognize them, but if you can: please filter them out.
- The matching algorithm should not select Wikimedia disambiguation pages as candidates
- The Mix-n-match UI should not allow coreferencing to Wikimedia disambiguation pages (or at least should warn "Please read the disambiguation page and select one of the persons there")
The BM Thesaurus has a bunch of entries where the description contains "Issued tokens", eg
- Henry Turner: Person; male; Issued tokens. Baker and possible member of the Bakers' Company.; retailer/tradesman; English
- H Tuttle: Issued tokens (Lowestoft).; English
- Simon Turner: Person; male; Issued tokens. Possible member of the Grocers' Company. Possibly associated and possibly neighbouring a tavern called The Pie or The Magpie.; retailer/tradesman; English
These are minor tradesmen or pub owners that coined some tokens. 100% of the 20-30 ones I checked are not on WD, not on Wikipedia, and unlikely to have any notability. So please filter them out from this catalog.
— The preceding unsigned comment was added by Vladimir Alexiev (talk) 19:15, 13 February 2015 (UTC)
- This feature has "always" existed: each entry has a button to mark it unsuitable for Wikidata. --Nemo 06:42, 14 June 2015 (UTC)
- @Nemo bis: the problem is that a big percent of BM entries are such minor tradesmen. Would be great if @Magnus Manske: can filter out entries having substring "Issued tokens" automatically --Vladimir Alexiev (talk) 11:36, 20 January 2016 (UTC)
- Well, there is an issue specifically with RDK artists: their xrefs (not: disambiguation) have id's of their own and they well may sum up to more than 50% of all identifiers. Fortunately tens of thousands of xrefs are already marked as N/A but their sheer number makes it impossible to leaf through the N/As in order to spot and inspect entries declared N/A for any other reason. So specifically for RKDartists one would wish that these xref entries would already have been kept out of the original import of the data set or if one had an "xref" tag different from "N/A" but identical in functionality...
- WRT disambiguation pages on Wikidata: Many of them are wrongly tagged in Wikidata and some datasets like DMNES have a strong affinity to disambiguation pages and I don't see any means of "solving" that. -- Gymel (talk) 13:09, 14 June 2015 (UTC)
Coreference AAT
@Magnus Manske: Can we coreference AAT, which is a concept (not person) thesaurus?
- I made an export file: https://www.wikidata.org/wiki/Wikidata:WikiProject_Authority_control#Coreference_AAT_with_Mix-n-Match
- How well would your automatic matching work for concepts?
- If not: there are 15k out of 40k coreferenced to Wordnet that we could salvage through BabelNet, should I work on that?
Oh, I now see AAT is already added! I can swear it wasn't there 2 days ago :-)
- 11% are auto-matched, I will look at how good they are
- Parent levels in brackets are omitted, probably because you don't escape the brackets Done. Eg AAT:dimidiating rhyta
- has: "rhyta, drinking vessels, <vessels for serving and consuming food>, <containers for serving and consuming food>, culinary containers, <containers by function or context>, containers (receptacles), Containers (Hierarchy Name), Furnishings and Equipment (Hierarchy Name), Objects Facet"
- but you only show: "rhyta, drinking vessels, , , culinary containers, , containers (receptacles), Containers (Hierarchy Name), Furnishings and Equipment (Hie"
- Maybe don't cut-off the parents?
- it would be too much to show the Scope Note at the first go. But we need it in a tooltip or something!
- Eg for AAT:underlayments, it's not enough to see parents "floor components, , surface elements (architectural), "
- To figure out which item (if any) on the disambiguation page https://en.wikipedia.org/wiki/Underlayment is a match, we need the scope note.
- It's way to inconvenient to get the scope note ("Plywood, hardboard, or other material placed on a subfloor to provide a smooth, even surface for applying the finish.") from Getty's site because it's 3 clicks away, and the language is not indicated (so I first hit the Chinese note that is no use to me).
- So please load the scope notes from my export file
--Vladimir Alexiev (talk) 10:56, 2 April 2015 (UTC)
@Magnus Manske: said adding the scope note as a tooltip would require major changes. Then please show it in addition to the parents, it really is important for accurate matching --Vladimir Alexiev (talk) 13:46, 9 April 2015 (UTC)
I made an evaluation of the current automatic matches at https://docs.google.com/spreadsheets/d/1n7BFmIGiBhKUSarXWBd88q3NgSdhJurhdyoyXAmRghA/edit?usp=sharing
- Looked at the first 25 matches, which seem to be randomly distributed
- Precision is about 50%
- 2/3 of the incorrect matches (8 of 12) are due to WD named entities (albums, locations, books). AAT includes no named entities, so such matches are outright impossible. WD has no explicit class for "named entity" but one could try to filter by high-level classes such as Human, Location, Work.
- In addition to Wikipedia, correct matches include Wikipedia categories and Wikisource (a dictionary article)
- Recall estimation: if half of the 11% WD auto-matched are correct, that makes 5.5% or 2.2k
- That's 15% of AAT-Wordnet corefs (15k)
- It's 7-9% of the potential matches (I believe that 25-30k of all AAT concepts are present in WD)
- Could also add alternative labels, and labels in other languages: will this help the matching?
I'll try to salvage AAT-Wordnet-Wikipedia corefs through BabelNet: https://www.wikidata.org/wiki/Wikidata:WikiProject_Authority_control#Coreference_AAT_through_BabelNet . --Vladimir Alexiev (talk) 14:35, 3 April 2015 (UTC)
Faulty autodescribe?
Mix-n-match describes Q190928 shipyard (in match against AAT:shipyards) as "Construction site, dock, and organization in Russia; places where ships are repaired and built".
But only the second of these descriptions is in Wikidata. So where does "Construction site, dock, and organization in Russia" come from?
- It is auto-generated from the statements. If it's wrong, fix the statements :-) --Magnus Manske (talk) 18:29, 5 April 2015 (UTC)
@Magnus Manske: Margaret Busby is auto-described as "born in 2000" or "*2000". But her birthdate statement says "20. century". 2000 is the last year of the 20th century, to be sure, but it's quite misleading. Can you auto-describe using '*20. century'? Runner1928 (talk) 17:49, 20 November 2015 (UTC)
Just another example where the auto-describe of a century date makes things look funky: Leocadia. Born 300 (really 3rd century), died 304. --Dcheney (talk) 12:25, 5 November 2018 (UTC)
Problem with Catholic Hierarchy's catalogues?
It seems the two CH catalogues - the one about bishops and the one about dioceses - do not automatically transfer the manually-confirmed data to Wikidata, while this happens with other catalogues. Can/Should it be fixed? --Sannita - not just another it.wiki sysop 00:12, 3 April 2015 (UTC)
- I didn't even know of such a sync. Is that what d:user:Reinheitsgebot does? --Nemo 06:45, 3 April 2015 (UTC)
- If you hover over a specific catalogue, there should be a sync option somewhere. Simply gives a filled in QuickStatements-page. You can also sync values from Wikidata. Sjoerd de Bruin (talk) 18:16, 3 April 2015 (UTC)
- @Nemo bis: It seems that it isn't. :/
- @Sjoerddebruin: Believe me or not, if you hover on those two particular catalogues there's no "sync" link at all. --Sannita - not just another it.wiki sysop 20:17, 3 April 2015 (UTC)
- The sync link only appears for catalogs that have a "direct" property, and for those that have a "hacked property" (via qualifier, e.g. CE1913). I try to not use the hacked one anymore; if there is a property for CH (bishops or dioceses), please let me know! --Magnus Manske (talk) 18:28, 5 April 2015 (UTC)
- @Magnus Manske: Thank you very much for your answer. :) Actually there's P1047 for the bishops, for the dioceses I'll check if there's any property proposal, and if not, I'll ask it. --Sannita - not just another it.wiki sysop 09:32, 7 April 2015 (UTC)
- The sync link only appears for catalogs that have a "direct" property, and for those that have a "hacked property" (via qualifier, e.g. CE1913). I try to not use the hacked one anymore; if there is a property for CH (bishops or dioceses), please let me know! --Magnus Manske (talk) 18:28, 5 April 2015 (UTC)
- If you hover over a specific catalogue, there should be a sync option somewhere. Simply gives a filled in QuickStatements-page. You can also sync values from Wikidata. Sjoerd de Bruin (talk) 18:16, 3 April 2015 (UTC)
Sort the catalogs
@Magnus Manske: 54 catalogs, wow that is great progress! But please sort them, since it becomes quite hard to see what's there. --Vladimir Alexiev (talk) 13:48, 9 April 2015 (UTC)
- They are sorted by the catalog ID; AAT is first, YourPaintings last :-) --Magnus Manske (talk) 13:53, 9 April 2015 (UTC)
Search in a few catalogs
Hi Magnus, it's great that we can now exclude certain catalogs from the search, but could you also add a similar link that defaults to checking only one catalog instead of all of them? For some names, like "Smith", it would be nice to be able to search in a single catalog at a time. Thx Jane023 (talk) 06:53, 5 May 2015 (UTC)
- On the catalog page, in the top row with all the links, there is now "search in this catalog only". Example. --Magnus Manske (talk) 14:04, 6 May 2015 (UTC)
SBN identifier
@Magnus Manske and Sannita: On Wikidata we already have the SBN identifier, the authority code from the Italian national library service. However, there are still few items with that code, probably all added manually. How about add this dataset to mix'n'match? All existing records can be extracted from here putting 2015 in the second form (the one with "a:") of "Morte/Fine". An example is the Aafjes Bertus record that has a litte bio plus the ID: IT\ICCU\SBLV\026326. --AlessioMela (talk) 13:54, 12 May 2015 (UTC)
- Importing them now. Will take a while, as I have to scrape every page individually, and there are ~65K... --Magnus Manske (talk) 17:16, 12 May 2015 (UTC)
- Yes, no API and thousand of pages to scraping, but you're great! Thanks! --AlessioMela (talk) 17:41, 12 May 2015 (UTC)
- Note, that's actually only 4 % of the total SBN identifiers, i.e. those which are public (and that VIAF is supposed to have as well). Vedi anche [2] in italiano. --Nemo 19:15, 12 May 2015 (UTC)
- True but I haven't found (there is?) the full list of identifiers. The site exposes only authority files, our 65k records, the 4%. Moreover the WD property works well only with "the 4%" because links to those records. The other 95% hasn't really (the trick showing book lists was rejected on WD) a frontview in the sbn site. --AlessioMela (talk) 19:38, 12 May 2015 (UTC) I mean, like I read in the ML you linked, that is more a sbn's problem, because they expose just a little part of their identifiers (and in such an unfriendly way).
- Note, that's actually only 4 % of the total SBN identifiers, i.e. those which are public (and that VIAF is supposed to have as well). Vedi anche [2] in italiano. --Nemo 19:15, 12 May 2015 (UTC)
- @Magnus Manske: Thank you very much. I'm working to convince ICCU to release more of those identifiers. This will definitely work on our side. ;) --Sannita - not just another it.wiki sysop 12:26, 13 May 2015 (UTC)
- Yes, no API and thousand of pages to scraping, but you're great! Thanks! --AlessioMela (talk) 17:41, 12 May 2015 (UTC)
Fossilworks
Would it be possible to add http://fossilworks.org/ to mixnmatch? I have been in contact with John Alroy and he welcomes the incoming links. One can get the XML information by using:
The ID that is inside of the XML tags already has a property: d:Property:P842. Currently 8538 of 319948 taxa on Fossilworks are matched. But we probably only have a fraction of that number on Wikidata. --Tobias1984 (talk) 19:12, 12 May 2015 (UTC)
- Hmm. They have an extensive download form, but I can't seem to get a simple list of all their taxa with IDs. Ideas? --Magnus Manske (talk) 15:52, 28 May 2015 (UTC)
NNDB links to others authorities, including Wikipedia
@Magnus Manske: I noticed that nndb has a "bibliography" for each entry. For example Patrick Stewart has an entry and a bibliography. On the bottom of the last one, there is a list of authority's links, including Wikipedia. May we use these infos both for matching with Wikidata and both to obtain matches with others ID in one shot? --AlessioMela (talk) 13:55, 20 May 2015 (UTC)
- Could do, though that means scraping >40K pages... I'll put it on the list. --Magnus Manske (talk) 15:46, 28 May 2015 (UTC)
Issue with Dictionary of Art Historians sync
The matching now is reported as complete again, however #2 of the duplicates shown at the DAH sync page is Q84305 which therefore should have two instances of propertiy P1343 but hasn't got any! -- Gymel (talk) 05:53, 29 May 2015 (UTC)
Feature Request: Assignment "coupled" entries
If you search for "Amy Katherine Browning", there currently are three entries from three different sources, quite obviously the same person and wether there exist a wikidata item or has to be created, they'll hopefully end up at the same item. When an item does already exist, I can assign them individually to that item, that's a mechanical repetition of an easy operation. It would now be nice to have at this place also an easy possibility to create a corresponding wikidata item, like under "Creation Candidates" (or to have a method to steer "Creation Candidates" to a specific name): Marking the entries as "Not in Wikidata" would substantially increase the risk of creating duplicates in Wikidata (different catalogues will by sync'ed at different times and of course one does not re-check all the entries already marked as "not in wikidata" over and over again) which may go unnoticed for a long time, and creating the appropriate item by hand is quite an effort one not always is inclined to take right at the moment. Actually, I decided to search for the name from the example in the context of sifting through the RDK artist's list of entries marked N/A, resetting that name to "unmatched": The name deemed so significant and the RKD information so rich that I tried the search in other catalogues whether someone did happen to find a matching wikidata item. Thus my "main context" was working through a specific list and during that I didn't want to spend too much time creating even "important" new WD items or deciding which of the catalogues would be in shape for a "spontaneos" item creation run. Besides, also ULAN knows her as "Amy Katherine Dugdale", equally unmatched. So perhaps what's really needed is a tool to "connect" entries from different catalogues first and then finish with "instant creation" - or to equip entries with identifiers from additional catalogues on creation - as I understand these will turn into matches upon the next sync of the respective catalogues. -- Gymel (talk) 13:42, 31 May 2015 (UTC)
- Trying to extract from large blob of text above: You suggest a "create item" link under search results, for an item with the name you just searched. I've just added that, it's straightforward enough. As a remark, I usually use "Search Wikidata" as the final step before creating an item, to ensure it doesn't already exist, and there is a "create item" link on the search results page. I'm not quite sure how you want to identify iidentical items across catalogs to "connect" them before creation though. --Magnus Manske (talk) 14:31, 31 May 2015 (UTC)
- Thanks, this already helps a bit. What I was aiming at, however (and sorry for the bl
oabber), was the possibility to create an Wikidata item for one of the entries I'm seeing in the search result. The steps were: I did try all the searching possibilities and could not match. But additionally I have the strong suspicion that other catalogues should also contain this entry, and a (quite loosely formulated) text search supports this: Several other catalogues have that item, either still unmatched or marked as "not in WD" (or sadly "N/A"). When in that situation creation of a meaningful item is made simple enough, I would be able to perform several, parenthetic, "cross-catalogue", assignments from the result list at hand. -- Gymel (talk) 18:10, 31 May 2015 (UTC)- So what exactly would you want me to build? A link with every search result, to search for that entry's full name? Or something more generic, like just the last name (if it is, indeed, a name; there are other entry types in the catalogs)? How could I automatically collate "Johann Sebastian Bach", "Johann Bach", and "J. S. Bach"? --Magnus Manske (talk) 18:52, 31 May 2015 (UTC)
- Probably more in the lines of "Create item from this" for each result entry which is either not matched or marked "not in WD". My expectation is that this would prefill English label and description from the entry and insert the identifier of the selected entry into the item created: That would be a good starting point to manually connect other results from the query. I see that this might be a potentially dangerous offer to the casual user... -- Gymel (talk) 21:55, 31 May 2015 (UTC)
- Done. Warning in "hover title". --Magnus Manske (talk) 22:04, 31 May 2015 (UTC)
- Sorry for the delay. Testing it with RKDArtists it turned out that having dutch phrases in english labels is not perfect, but changing the prefilled label is still easier than memorizing things and fill out blank forms, and personally I find the new solution really helpful. Thanks a lot. -- Gymel (talk) 22:25, 3 June 2015 (UTC)
- Done. Warning in "hover title". --Magnus Manske (talk) 22:04, 31 May 2015 (UTC)
- Probably more in the lines of "Create item from this" for each result entry which is either not matched or marked "not in WD". My expectation is that this would prefill English label and description from the entry and insert the identifier of the selected entry into the item created: That would be a good starting point to manually connect other results from the query. I see that this might be a potentially dangerous offer to the casual user... -- Gymel (talk) 21:55, 31 May 2015 (UTC)
- So what exactly would you want me to build? A link with every search result, to search for that entry's full name? Or something more generic, like just the last name (if it is, indeed, a name; there are other entry types in the catalogs)? How could I automatically collate "Johann Sebastian Bach", "Johann Bach", and "J. S. Bach"? --Magnus Manske (talk) 18:52, 31 May 2015 (UTC)
- Thanks, this already helps a bit. What I was aiming at, however (and sorry for the bl
Syncing credits or responsibility
Earlier today I was quite surprised to be suddenly marked as responsible for a triplet in the NPG matching for "W. Allen" (Q6840743), because I still remembered to have removed those matches the day before. Turned out to be kind of a tracking glitch, Mix'n'match seems to have missed my changes on Wikidata. When I synced the "connections on Wikidata, but not here", Mix'n'match "read" them back and maked me as responsible for the "match". Removing the matches again in Mix'n'match and performing an arbitrary edit on the wikidata item resolved the issue, but wouldn't it make more sense to credit matches created by syncing Wikidata to M'n'm rather to a "Wikidata sync" pseudo user than to the user who happened to perform the sync? I mean, there is no chance to assess those matches before the sync and blaming Wikidata instead of the executing user might be more on the spot when it comes to cleaning up some mess. -- Gymel (talk) 22:36, 3 June 2015 (UTC)
Unsyncable entries, please stand up
RKDartists has tons of cross reference entries (Peter Miller see Miller, Peter), each with an identifier of its own. Fortunately enough %gt; 75.000 are already marked as "N/A". However some of them seem to have been matched to Wikidata before they have been reassessed and marked as N/A. I have worked through the constraint report ("single value" violations) for the corresponding Property P650, the residue should be in correspondence to the 78 "double Q's in this catalogue". However the announcement "287 items not in this catalogue but in wikidata" stays constant when trying to sync.
My theory is, that these are cases where only one identifier is stored in Wikidata and this identifier is marked as N/A in Mix'n'match (because it's a cross reference identifier or an identifier yielding a 404 error page or someone made an error). More generally perhaps, because syncing would create a conflicting assignment in Mix'n'match (unlikely in this case, because then syncing into the other direction would provoke a constraint violation on Wikidata). If so, it would be nice to have a list of these identifiers for substituting the values on wikidata by the appropriate ones one gets when following the cross reference. Alternatively on can just wait until subsequent independing matching of the appropriate entry makes itself visible by triggereing a constraint violation message.
To get this list one would have to dump the Mix'n'match data for the given catalogue, export a beacon file for the corresponding property, diffing/cancelling out all the entries with coinciding values to get a list of those Q-items with "surplus" values on Wikdiata, which then could be pasted to autolist to yield something clickable to operate on (here I'm rather thinking of inspecting the items, not bulk-deleting the property). I imagine it would be quite easy to provide an "inspect" or "preview" link next to the "sync" button in Mix'n'match - activating that link would simply show a list of Q-items (and corresponding values) for the entries Mix'n'match announces as "syncable". -- Gymel (talk) 23:05, 3 June 2015 (UTC)
Manual for Mix'n'match
Just to notice you all that I wrote a manual for using Mix'n'match, since I received some requests for a how-to. Just a couple of advices:
- unfortunately, for the moment it's only in Italian, 'cause it is my mother languages and all the requests I had for a manual were from Italian-speaking users, but I promise I'll do my best in the next days to start translating it also in English. :) I also marked it for translation, so that more languages may follow;
- it just deal with the semi-automatic and manual games, all the remaining stuff is still to be done, so any help is welcome. :)
Let me know your thoughts about it! --Sannita - not just another it.wiki sysop 15:46, 11 June 2015 (UTC)
- I've put an English translation at Mix'n'match/en - I don't know if we want to keep this at a subpage or include it in the main entry, though. Andrew Gray (talk) 18:11, 19 June 2015 (UTC)
Merged items
What happens when a Wikidata item matched to a mix'n'match entry is merged to another item? Does mix'n'match follow redirects, once created? Does it continue believing in the deprecated ID? Does it report the change in some way? While working on BEIC ids I regularly find a few duplicates and merge them, but I worry that I'm "losing" them; same (and worse) for items I split, but that's rarer. --Federico Leva (BEIC) (talk) 06:35, 14 June 2015 (UTC)
- Adding the property to the redirected item will yield an error on syncing from M'n'M to Wikidata. Syncing from Wikidata to M'n'N for the identifier moved to the new target item won't happen as long there is the old association in Mix'n'Match. Thus merging items will increment both the "not here" and "not in Wikidata" counters without any direct indication what the items or identifiers involved might be or that these are coupled in the sense of concerning the same id. There definitely is room for improvement, see my comment #Unsyncable entries, please stand up above.
- However I'm not sure if Mix'n'Match should implement some automatic remedy feature: How long should it wait (a erroneous merge could be reverted ~shortly~ after)? Who should be recorded as the person or process responsible for the match: The original match has been performed by X, the merge has been executed by some Wikidata user Y with no record on Mix'n'Match, the merge then has been tracked by some automated Mix'n'Match subcomponent Z: There is no clear answer (cf. my comment #Syncing credits or responsibility above).
- Repairing associations in other cases may be tedious and frustrating too: Usually they will be visible in either "double Q's in this catalog" or the corresponding constraint report in Wikidata: But assume the constriant violation is resolved by some Wikidata user not aware of Mix'n'match, there is the dange of the next sync re-inserting the same value into wikidata, triggering the problem report for the same problem again. Generally, removing associations in Mix'n'Match does leave the id in the corresponding Wikidata item (you won't know what this was after the removal) and the next sync may import it back to mix'n'match. So perhaps (for logged in users) on removal of a match the tool should ask "remove from Wikidata, too?" or even make this the default action... -- Gymel (talk) 13:31, 14 June 2015 (UTC)
Adding the USDA NDB catalog for matching
Would it be possible to add the USDA NDB catalog for matching on Mix N'Match ? The Excel catalog is available at http://www.ars.usda.gov/Services/docs.htm?docid=24912 (column A and B are relevant)
The USDA NDB property (P1978) to match the entries has just been created. --Teolemon (talk) 23:18, 6 July 2015 (UTC)
- Importing now. --Magnus Manske (talk) 10:39, 7 July 2015 (UTC)
- Thank you so much :-) Now, if they could have chosen more legible labels :-S --Teolemon (talk) 18:10, 29 July 2015 (UTC)
Adding the O*NET-SOC 2010 catalog for matching jobs and occupations
Thanks a lot for the USDA NDB catalog: the Open Food Facts community (including me) has started the matching work :-)
Would it be possible to add the O*NET-SOC 2010 catalog for matching on Mix N'Match ?
- The basic Excel catalog (under public domain) is available at
- https://www.onetcenter.org/taxonomy/2010/list.html
- https://www.onetcenter.org/taxonomy/2010/list.html/2010_Occupations.xls?fmt=xls (column A, B and C are relevant)
- a version with synonyms is available at http://www.bls.gov/soc/soc_2010_direct_match_title_file.xls
The SOC Occupation Code (2010) property (P919) to match the entries has existed for a while.
thank you so much,
--Teolemon (talk) 09:50, 12 July 2015 (UTC)
Adding the Klassifikation der Berufe 2010 (KldB 2010) for matching jobs and occupations
Would it be possible to add the "Klassifikation der Berufe 2010" catalog for matching on Mix N'Match ?
- The Excel catalog is available at the bottom of dewiki
- The KldB-2010 occupation code property (P1021) to match the entries has existed for a while.
thank you so much, --Teolemon (talk) 10:16, 12 July 2015 (UTC)
- @Teolemon: Done. --Magnus Manske (talk) 19:45, 5 August 2015 (UTC)
- @Magnus Manske:: Thanks Magnus. I'm trying to get the major job code systems (I've proposed the US for Mix N'Match above, and I will try to get ISCO, which is the international one). They come in really handy for statistical comparison between countries, and to break down barriers in job searches.--Teolemon (talk) 12:05, 21 August 2015 (UTC)
UK Parliament bio
Could you please add d:Property:P1996 to the tool? A basic list of all MPs is available at http://www.parliament.uk/mps-lords-and-offices/mps/ and Lords at http://www.parliament.uk/mps-lords-and-offices/lords/ and http://www.parliament.uk/mps-lords-and-offices/lords/-ineligible-lords/ although there are many more pages for former members of the legislature which it would be useful to link to, eg. Gordon Brown. Unfortunately I can't find a single comprehensive index of all of the pages but there may be a way of extracting such a list. Google finds around 1800 pages with URLs including the stem http://www.parliament.uk/biographies/. Thanks, Rock drum (talk · contribs) 16:36, 17 July 2015 (UTC)
- @Rock drum: Imported ~1400 here. --Magnus Manske (talk) 08:56, 16 September 2015 (UTC)
- @Magnus Manske: Thank you. Rock drum (talk · contribs) 15:52, 17 September 2015 (UTC)
ORCID in Mix'n'match
What kind of ORCID-indexed people are selected for inclusion in Mix'n'match? I suppose it is not the entire ORCID that is in Mix'n'match? — Finn Årup Nielsen (fnielsen) (talk) 12:35, 4 August 2015 (UTC)
- Just a tiny ORCID subset. Pre-selected to match names in Wikidata, IIRC. --Magnus Manske (talk) 19:46, 5 August 2015 (UTC)
Bug: Search for labels gets cut off at apostrophe
I’m working on the FRS list, and some of the fellows have an apostrophe in their name (for example, William O'Shaughnessy Brooke). For these names, the Search links are all broken: they’ll cut off at the apostrophe (in this case, search for William O). Can this be fixed? —Galaktos (talk) 11:13, 3 September 2015 (UTC)
- Should be fixed now. --Magnus Manske (talk) 13:14, 16 September 2015 (UTC)
- Great, thanks! —Galaktos (talk) 21:51, 16 September 2015 (UTC)
Open Library identifier (d:Property:P648)
Please add support for Open Library on Mix'n'match. Their data is accessible within API or data dumps. Maybe Mix'n'match will be useful also on OL/IA front, since there are many duplicated entries for each person (some listed at d:Wikidata:Database_reports/Constraint_violations/P648#.22Single_value.22_violations). Lugusto • ※ 17:00, 13 September 2015 (UTC)
- The entire set is a little, well, large... I'll start with the authors. --Magnus Manske (talk) 09:14, 16 September 2015 (UTC)
- OK, there are 6.8M authors in OLID. That's more entries than in all other mix'n'match catalogs combined. Too many. I have imported ~333K entries that have both birth and death dates, for now. --Magnus Manske (talk) 11:17, 16 September 2015 (UTC)
Recognise URLs with explicit protocol on game mode
It would be nice that addresses with "http://" or "https://" could also be recognised as valid ones on game mode so that Q number could be set easily. --abián 19:21, 20 September 2015 (UTC)
Paris street code
Would it be possible to add Paris digital street code to (d:P:P630) to mix'n'match ? There is a list of entries at http://www.v2asp.paris.fr/commun/v2asp/v2/nomenclature_voies/Voieactu/index.nom.htm Sadly, it appears that it is no longer updated, as there is a slightly longer list at http://opendata.paris.fr/explore/dataset/voiesactuellesparis2012/?tab=table but still it would be be useful (a little more than 10% of entries are still missing in Wikidata, based on numbers returned by wdq). --Zolo (talk) 15:35, 10 October 2015 (UTC)
UNSPSC
Would it be possible to add the UNSPSC catalogue to Mix'N'Match? See P2167. If it would be helpful, I could convert the catalogue into your preferred format, but I cannot find any documentation of the Mix'N'Match upload format. See also this conversation. Cheers, Bovlb (talk) 19:06, 25 October 2015 (UTC)
Add ISOCAT
Could [3] be added? It corresponds to P2263. I would be willing to help, if needed. Popcorndude (talk) 15:06, 13 November 2015 (UTC)
Feature request : specify a set of candidate matching
Hi, I recently added this catalogue : https://www.wikidata.org/w/index.php?title=Wikidata:WikiProject_Movies/Properties&diff=282800978&oldid=276661177
I made have a few mistakes myself in preparing the dataset, but users (especially User:Tinm, thanks to him, have a few complaint which could help make mix'n'match better.
A few complaints :
- the search does not work.
- Partly it's my fault and the labels I generated are bad for it. Do we have good guidelines like "the generated label should follow the Wikidata label conventions" ?
- Partly it seems to be because the dataset is a multilingual one, with main language english but with names taken from all other the world. This makes the default search inefficient and doing the google "all wikipedias" search systematically was too fastedious for Tinm.
- The second big complaint is related to the fact that the dataset have way more ids than there is items. I'll suggest too things to handle this usecase :
- First : I think it would help to restrict the candidates for matching Wikidata items to, say, a PagePile result set. Clearly it's inefficient to search in all wikipedias if we can already have a good candidate set to search in
- second : choosing a random id to match to is likely to have no item. So I think that when we have a good set of candidates for matching, it would be useful to reverse the way Mix'n'Match currectly works, for example in game mode : picking one existing item, not one random id, and present a set of candidates ids ...
TomT0m (talk) 10:48, 12 December 2015 (UTC)
Remove match doesn't work?
Hi,
Sometimes I make a mistake when entering the Q number, but I notice right away that the article name is wrong. Then I do "remove match". But today I noticed that the removal hadn't really been performed. I have been using the tool only this month. Could you please check it out? --Joutbis (talk) 17:41, 24 December 2015 (UTC)
Feature request: jump to an article number
Hi,
Would it be possible to jump to a given article number? I find I'm only doing words with an "A".--Joutbis (talk) 17:49, 24 December 2015 (UTC)
FAST catalog
Hello, I would like to add the FAST catalog (property FAST-ID (P2163) on Wikidata) to mix'n'match. From their latest data dump, I've created a tsv-file for import into mix'n'match containing all persons from this catalog: [4] (about 60.5 MiB, 783.076 lines). Because that's quite big, here is the same file split into four chunks of max. 20 MiB: [5] [6] [7] [8]
The rows contain three tab-separated columns with the FAST-ID, the person's name and a description. The description consists of the gender, birth- and death-dates, affiliation, Wikidata-ID as found in the FAST database, LCAuth-ID (P244) as found in FAST, VIAF-ID (P214) as found in FAST.
In the future I might add more types of data (geographic, organizations, …) from the FAST database, but for now the person data should be enough work ;-). --Floscher (talk) 18:17, 8 January 2016 (UTC)
NCES District ID
Magnus, we now have P2483 on Wikidata, the NCES District ID. I believe most of the school districts in the United States already exist in Wikidata. Would you be able to get a recent dataset from Local Education Agency (School District) Universe Survey Data to put into Mix-n-Match for P2483? That ID is also known as LEAID in the flat files available from NCES. The columns LSTREE, LCITY, LSTATE, and LZIP are Location address columns (as opposed to Mailing address columns that begin with M) and can be used to disambiguate school districts. Thanks in advance for considering. Runner1928 (talk) 19:46, 11 February 2016 (UTC)
6DEG has now a Property
Catalogue 107 (6 degrees of Francis Bacon) now has a wikidata property: d:Property:P2401 and matches could be transferred. -- Gymel (talk) 21:46, 29 February 2016 (UTC)
Geni.com profile ID property
Geni.com now has a property in wikidata (d:Property:P2600) so maybe should be added to Mix'n'match. The majority of the Geni.com profiles are not notable people (it's a general-purpose genealogy site so everybody can create a profile), thus only items from wikidata should be searched in Geni.com, and not the reverse (trying to match all Geni.com profiles into wikidata). —surueña 20:41, 16 March 2016 (UTC)
- There are currently >3.1 million people in Wikidata. That would mean 3.1 million search requests to geni.com. I don't think they'd be too happy about that. Maybe they could do that internally, or on a geni dump, but as it is... --Magnus Manske (talk) 10:12, 8 April 2016 (UTC)
BVMC person ID
It would be great to have the property P2799 included in Mix'n'match. Please feel free to let me know if I can help with this in some way. --abián 18:44, 4 May 2016 (UTC)
- I had a brief look at their website, but couldn't find a data download, and didn't have time to figure out the SPARQL. If you can get me a simple file with their ID, person name, maybe a short description, and a URL or URL pattern, that would be great. You can also take that data and import it yourself! --Magnus Manske (talk) 08:48, 6 May 2016 (UTC)
- @Magnus Manske: You can download data for an ID (for example, 273) in:
- In particular, this tag could be helpful:
<nameOfThePerson rdf:datatype="http://www.w3.org/2001/XMLSchema#string">García Lorca, Federico</nameOfThePerson>
- IDs go from 3 to 99999. Null entries don't contain tags <identifierForThePerson> and <nameOfThePerson>. --abián 09:04, 6 May 2016 (UTC)
- Thanks! Do you think it is OK to hit their site 100K× to get all entries? --Magnus Manske (talk) 09:20, 6 May 2016 (UTC)
- I sent them an email warning that their site is being linked from Wikidata. By the moment, I haven't received a reply, but I think you can start with, at least, the first thousands of IDs. Thank you in advance! --abián 10:00, 6 May 2016 (UTC)
- @Magnus Manske: Confirmed, you can go on with the process. :D --abián 09:35, 7 May 2016 (UTC)
- I sent them an email warning that their site is being linked from Wikidata. By the moment, I haven't received a reply, but I think you can start with, at least, the first thousands of IDs. Thank you in advance! --abián 10:00, 6 May 2016 (UTC)
- Thanks! Do you think it is OK to hit their site 100K× to get all entries? --Magnus Manske (talk) 09:20, 6 May 2016 (UTC)
- IDs go from 3 to 99999. Null entries don't contain tags <identifierForThePerson> and <nameOfThePerson>. --abián 09:04, 6 May 2016 (UTC)
Took a while, but now here. Some already matched via VIAF, name/date-based matching running now. Will sync initial matches to Wikidata. --Magnus Manske (talk) 09:46, 10 May 2016 (UTC)
- Thank you very much, Magnus. --abián 21:24, 10 May 2016 (UTC)
ECARTICO person ID
It would be great to add to the Mix'n'match catalog database ECARTICO - very valuable and useful collection of "structured biographical data concerning painters, engravers, printers, book sellers, gold- and silversmiths and others involved in the ‘cultural industries’" of the Dutch and Flemish Golden Ages. See more in http://www.vondel.humanities.uva.nl/ecartico/ . This project is supported by University of Amsterdam and free licensed under CC-BY-SA. Currently the database contains about 500 people, but this is "gold pages" (in my opinion).
See also Wikidata property proposal.
--Kaganer (talk) 17:39, 9 June 2016 (UTC)
- Now here. --Magnus Manske (talk) 09:08, 10 June 2016 (UTC)
- Thanks! --Kaganer (talk) 20:08, 13 June 2016 (UTC)
- Dear Magnus, I have one question: in the "auto-matched" mode, auto-matched Wikidata items are not displayed dates of life, although they are in the Wikidata. This is (un)known bug or some feature? If this is feature, then it is very uncomfortable feature. --Kaganer (talk) 00:16, 14 June 2016 (UTC)
- AutoDesc was down, restarted. --Magnus Manske (talk) 08:17, 14 June 2016 (UTC)
- I saw, thank you! --Kaganer (talk) 11:55, 15 June 2016 (UTC)
- AutoDesc was down, restarted. --Magnus Manske (talk) 08:17, 14 June 2016 (UTC)
- As i know, ECARTICO already contains 1332 (and counting) links to Wikidata. See http://www.vondel.humanities.uva.nl/ecartico/apis/xlink.php?domain=wikidata . These links was used for auto-matches? If not, maybe re-check matches (I'm ready to watch and check the list of conflicts)? --Kaganer (talk) 13:57, 15 June 2016 (UTC)
- I associated the ones I have with Wikidata. But I didn't import the missing ones. Why are there >1000 in that list, when the website only lists ~500? --Magnus Manske (talk) 22:17, 15 June 2016 (UTC)
- I'm sorry, this my mistake... This database contains 503 people only with surname started from "A" ;) "The database currently contains biographical data on 24 912 persons. Painters: 7 671, Engravers: 1 292, Booksellers, printers and publishers: 865, Gold- and silversmiths: 1 953, Sculptors: 209" --Kaganer (talk) 22:44, 15 June 2016 (UTC)
- By way, this DB contains many negligible persons (e.g. this) - all these relatives of significant persons, included for completeness genealogical connectivity. Maybe need to filter this dataset on the basis of filling "Occupation(s)" field? --Kaganer (talk) 22:54, 15 June 2016 (UTC)
- I'm sorry, this my mistake... This database contains 503 people only with surname started from "A" ;) "The database currently contains biographical data on 24 912 persons. Painters: 7 671, Engravers: 1 292, Booksellers, printers and publishers: 865, Gold- and silversmiths: 1 953, Sculptors: 209" --Kaganer (talk) 22:44, 15 June 2016 (UTC)
- I associated the ones I have with Wikidata. But I didn't import the missing ones. Why are there >1000 in that list, when the website only lists ~500? --Magnus Manske (talk) 22:17, 15 June 2016 (UTC)
OK, I now have ~21K entries, they are auto-matching now. Have not filtered them; they should get N/A status in mix'n'match. --Magnus Manske (talk) 14:44, 17 June 2016 (UTC)
- OK, thanks! Wikidata property also has been created. --Kaganer (talk) 21:18, 19 June 2016 (UTC)
- Added, and synced. --Magnus Manske (talk) 09:03, 20 June 2016 (UTC)
- How do re-check data from source database? This is performed manually? Some items was redirected by my request (3031 > 2921, 8807 > 8806). And so it will be again... --Kaganer (talk) 13:34, 22 June 2016 (UTC)
[ECARTICO] update from source
At the moment, mix'n'match does not update from the original source. Some catalogs could be updated with new entries, but there is no mechanism to change/remove entries in mix'n'match automatically. --Magnus Manske (talk) 14:04, 22 June 2016 (UTC)
- Ok, but should be working process for manually addition/deletion some items? Maybe needs to standartize request's form about this issue? Such cases occured frequently, IMHO. --Kaganer (talk) 16:46, 23 June 2016 (UTC)
[ECARTICO] Search
Also one question: with "Search only in this catalog" some existing items impossible to find. As example:
- "Thomas Allen" (16292) is founded successfully
- but his daughter "Mary Allen" (16291) -- unable to find
--Kaganer (talk) 17:44, 23 June 2016 (UTC)
Completing Great Aragonese Encyclopedia (GEA)
I think that the first time that this tool automatically matched Wikidata items with GEA entries, it only checked a reduced number of entries.
Could it run over the complete GEA? Thanks in advance. --abián 15:26, 21 July 2016 (UTC)
- Done. --Magnus Manske (talk) 21:00, 21 July 2016 (UTC)
Smithsonian Museum of American History
Any chance you could add this to Mix n' Match http://americanhistory.si.edu/collections --HCShannon (talk) 14:54, 30 July 2016 (UTC)
- I'd be happy to, if you can find me a way to download or scrape (e.g. automatically browse a complete list list) their data. --Magnus Manske (talk) 10:12, 1 August 2016 (UTC)
Upload more to a catalogue
I did a test and created catalogue WikiTree. Question Can I add more to this catalogue or do I need to create a new catalogue? Salgo60 (talk) 11:04, 18 August 2016 (UTC)
- As it is now, you will have to create a new catalog. The idea is that you would only upload catalogs that are complete (or as complete as possible at the time). --Magnus Manske (talk) 11:52, 18 August 2016 (UTC)
Russian encyclopedias
Can you add encyclopeidas in Russian? Lists of articles: https://ru.wikipedia.org/wiki/%D0%9F%D1%80%D0%BE%D0%B5%D0%BA%D1%82:%D0%A1%D0%BB%D0%BE%D0%B2%D0%BD%D0%B8%D0%BA%D0%B8 --Ctac (talk) 18:05, 31 August 2016 (UTC)
- In principle, sure. However, (1) mix'n'match is designed to map entries on external sites to Wikidata (and thus, Wikipedia), and the pages I found following your link seem to be mostly linked names, without external IDs or URLs to point to. (2) Importing all of that would keep someone busy for a long time, and I am already busy ;-) If there is one you particularly would like in mix'n'match you can import them yourself, here. --Magnus Manske (talk) 11:12, 1 September 2016 (UTC)
UAI code (code for french schools)
Hi,
Is it possible to add this two files: Établissements d'enseignement secondaire (secondary schools) and Établissements d'enseignement supérieur (higher education) (Code UAI
)? The property in Wikidata is P3202.
Tubezlob (talk) 15:22, 23 September 2016 (UTC)
Onisep occupation ID
Hi,
It is possible to add this file: Liste des métiers ONISEP? The ID is $1 in this URL: http://www.onisep.fr/http/redirection/metier/identifiant/$1
and the property is P3214. Thank you! --Tubezlob (talk) 09:35, 1 October 2016 (UTC)
- Now here. Please note that this could have been imported by anyone using the import function. --Magnus Manske (talk) 08:39, 4 October 2016 (UTC)
- Thank you Magnus, I did not know. --Tubezlob (talk) 12:05, 5 October 2016 (UTC)
Supermodels.nl
Sorry, this was created twice. 'Supermodels.nl' is the one to delete. 'Supermodels' is the correct one. I hope this can be set. Thierry Caro (talk) 23:40, 6 November 2016 (UTC)
- There are also two small changes that should be done in the 'Réserves Naturelles de France' catalog. The alphanumeric identifier for 'réserve naturelle nationale de la grotte du T.M. 71' should be
grotte-du-t.m.-71
and the one for 'réserve naturelle régionale du lac de Grand-Lieu' should belac-de-grand-lieu-rnr
. I wonder, by the way, if there is some regular checks about differences that may appear between Mix'n'match stored external IDs and Wikidata stored external IDs. Whatever, thank you for everything. Thierry Caro (talk) 13:31, 7 November 2016 (UTC)- @Magnus Manske: 'New York magazine' can also be deleted because 'Model Manual' is the exact equivalent, and the latter has been completed. And then 'Model Manual' also needs some editing whatever. The external IDs have a URL-pattern problem with all the
/
replaced by%2F
. Can you change this the other way around? Thanks for your help and sorry for all the issues with my importing catalogs. Thierry Caro (talk) 14:56, 11 November 2016 (UTC)
- @Magnus Manske: 'New York magazine' can also be deleted because 'Model Manual' is the exact equivalent, and the latter has been completed. And then 'Model Manual' also needs some editing whatever. The external IDs have a URL-pattern problem with all the
OK,
- "Supermodels".nl deactivated
- "New York magazine" deactivated
- I can't find the two "Réserves Naturelles de France" entries you mentioned
- I have fixed the "Model Manual" URLs
--Magnus Manske (talk) 16:18, 11 November 2016 (UTC)
- Thank you very much. You saved the day! Thierry Caro (talk) 21:46, 12 November 2016 (UTC)
- @Magnus Manske:. Model Manual may now be associated to the newly created P3379. Can you add this to your tool and export to Wikidata the data that results from the matches already established on Mix'n'match? That would be awesome. Thierry Caro (talk) 03:07, 2 December 2016 (UTC)
- Eventually, I have downloaded the matches and imported them to Wikidata myself. The only thing that remains to be done is adding the property to your tool so that future matches will be automatically reported there. Thanks again. Thierry Caro (talk) 03:37, 2 December 2016 (UTC)
- Sorry, busy. Added the property now. --Magnus Manske (talk) 14:38, 15 December 2016 (UTC)
- Eventually, I have downloaded the matches and imported them to Wikidata myself. The only thing that remains to be done is adding the property to your tool so that future matches will be automatically reported there. Thanks again. Thierry Caro (talk) 03:37, 2 December 2016 (UTC)
- @Magnus Manske:. Model Manual may now be associated to the newly created P3379. Can you add this to your tool and export to Wikidata the data that results from the matches already established on Mix'n'match? That would be awesome. Thierry Caro (talk) 03:07, 2 December 2016 (UTC)
LEI file
Hi Magnus - I've downloaded the openly available dump from GLEIF that lists over 400,000 corporations and other legal entities from around the world, and run a process to generate the tab format file that Mix n Match handles. However it's about 50 MB in size. Also it's not really a single language (the entity type can be in a number of different languages for instance, though usually English). Any suggestions on how best to handle this? I could split it up into smaller chunks by country if that would help. The associated property for the id is 1278 (Legal Entity ID) and only has 77 values currently set in wikidata. ArthurPSmith (talk) 18:32, 14 November 2016 (UTC)
- Ok, I uploaded a US-only portion (about 120,000 records) and working with that for now. Let me know if I should do something different though. Thanks for this tool! ArthurPSmith (talk) 22:03, 15 November 2016 (UTC)
303 and 304 can be deleted
I seem to have a hard time encoding files so that they appear correctly here. I'm very sorry about this. Would you be OK to simply delete those two catalogs? Thierry Caro (talk) 07:44, 4 December 2016 (UTC)
- @Magnus Manske: The correct one, eventually, is 306. Thierry Caro (talk) 16:48, 4 December 2016 (UTC)
- And then 301 now has its own property, which is P3401. The matches have been exported to Wikidata. Thierry Caro (talk) 18:05, 11 December 2016 (UTC)
- You may also add P3404 to catalog 300. The matches are already on Wikidata. If possible, you may intervene on the database so that the links stored and to-be-generated will be such as
http://www.vogue.fr/thevoguelist/thevoguelist/cindy-crawford/511
instead ofhttp://www.vogue.fr/thevoguelist/thevoguelist/cindy-crawford%2F511
. This is an encoding problem again. Thank you for all. Thierry Caro (talk) 23:40, 12 December 2016 (UTC)
- You may also add P3404 to catalog 300. The matches are already on Wikidata. If possible, you may intervene on the database so that the links stored and to-be-generated will be such as
- And then 301 now has its own property, which is P3401. The matches have been exported to Wikidata. Thierry Caro (talk) 18:05, 11 December 2016 (UTC)
Deleted 303 and 304, added properties to 300 and 301. --Magnus Manske (talk) 14:44, 15 December 2016 (UTC)
- Thank you. Thierry Caro (talk) 18:47, 21 December 2016 (UTC)
Add a property corresponding to the "FAO Races" catalog
The property d:Property:P3380 has been at last created that was intended to be the match of this catalog into Wikidata. I don't know (if I can/)how to add it afterwards, so please point me to a procedure or be kind to do this :) TomT0m (talk) 10:50, 4 December 2016 (UTC)
- Added, synced. --Magnus Manske (talk) 14:48, 15 December 2016 (UTC)
Kelvinator stove fault codes ???
Hi Magnus - currently every label (at least for the catalogs I tried - for example GRID) is displaying as "Kelvinator stove fault codes" and the description is always "0". in Mix n Match. Not good! Something broken ??? ArthurPSmith (talk) 19:41, 26 December 2016 (UTC)
FundRef dataset does not load
I would like to use this wonderful tool to fill wikidata:Property:P1905, which according to Wikidata should be done at https://tools.wmflabs.org/mix-n-match/?mode=catalog_details&catalog=62 , but this page keeps "Loading..." forever… Any idea why? − Pintoch (talk) 00:22, 19 January 2017 (UTC)
- I also have another question: say I have a dataset that not only contains identifiers, but also other useful information that could be added to the matched items. Is there a way to specify these statements in the catalogue, so that they are added to the matching item (if any)? Otherwise it is of course possible to do that with a bot, but it seems a bit overkill to write a bot from scratch for each dataset. − Pintoch (talk) 00:42, 20 January 2017 (UTC)
500 Server Error
Mix n Match seems to be broken! I hope it wasn't me! ArthurPSmith (talk) 21:37, 19 January 2017 (UTC)
Sync functionality
Hi, I tried to synchronize the OpenISNI-1 catalog with Wikidata, as I found ways to add identifiers from other sources (importing from VIAF and GRID). I get an error message: 'Error:Unknown action ""'. Here is a screenshot: http://pintoch.ulminfo.fr/a4cb46ba3a/sync_error.png Thanks a lot for your work! − Pintoch (talk) 11:57, 3 February 2017 (UTC)
- It looks like datasets are automatically sync'd after some time. That's even better, thanks for that! − Pintoch (talk) 09:01, 14 March 2017 (UTC)
Improved search?
Hi Magnus, I find myself using the "search only in this catalog" link a lot - which is great, but what would be even more helpful would be a couple of little changes:
- allow filtering of the search results by match status (for example filter out all manually matched entries)
- allow searching of descriptions as well as labels
Any chance this could be done? Thanks! ArthurPSmith (talk) 16:33, 14 February 2017 (UTC)
- @Magnus Manske: One year on, any progress on either of these, especially of the latter? Mahir256 (talk) 20:21, 28 February 2018 (UTC)
Obsolete base
Hello. And sorry. #384 should be totally dropped, as #385 is the good one. Can you have a look? Thierry Caro (talk) 15:59, 17 February 2017 (UTC)
Can we add VICNAMES database?
This is to import this CSV file. (Don't know why the extension says .json when it is actually CSV, but that doesn't matter.) We want to import column 9 "Place Id" into the VICNAMES Place ID (P3472) property. The Wikidata label will be matched against column 4 ("Place Name"), although there are going to be various ambiguities which is why mix'n'match is needed rather than a more automated approach. Columns 2 ("Municipality"), 4 ("Feature Type"), 7 ("Longitude") and 8 ("Latitude") may be useful in trying to resolve some of those ambiguities. (Please ignore column 3 "Name Id", that column is a different ID number from P3472 and is not currently in use by Wikidata.) Note that CSV file is released under Creative Commons Attribution 3.0 license – to confirm that yourself, go to VICNAMES, press the Download button (green downward pointing arrow), pick a municipality and a feature type randomly, click "Download", you will see the license agreement links to https://creativecommons.org/licenses/by/3.0/legalcode – also, BTW, I have zero affiliation with the operators of this DB, and haven't discussed this with them, but since they are offering a database download with CC-BY licensing so we don't strictly speaking have to do that. (The linkage of the Wikidata entry to their DB should be sufficient attribution for CC-BY purposes.) Thanks, SJK (talk) 12:29, 10 March 2017 (UTC)
- Probably put the data of the other columns in the "Catalog Description" field. So for example:
- "State","Municipality","Name Id","Place Name","Place Name Status","Feature Type","Longitude","Latitude","Place Id"
- "VIC","EAST GIPPSLAND SHIRE","17990","MOUNT BULLA BULLA","REGISTERED","MT","148.4184722","-37.0604167","11866"
- You could set description as something like "Municipality: EAST GIPPSLAND SHIRE; Place Name Status: REGISTERED; Feature Type: MT; Longitude: 148.4184722; Latitude: -37.0604167". No need to include "State" because it is always "VIC" nor "Name Id" since it is pretty useless (and people might confuse it with the "Place Id" mistakenly.) SJK (talk) 13:00, 10 March 2017 (UTC)
- Oh, and by the way, the URL for each entry is just https://maps.land.vic.gov.au/lassi/VicnamesUI.jsp?placeId= + "Place Id". SJK (talk) 13:06, 10 March 2017 (UTC)
- I found this I'm getting ready to do it myself now. SJK (talk) 05:00, 11 March 2017 (UTC)
- Okay, I did it. SJK (talk) 08:59, 11 March 2017 (UTC)
- I found this I'm getting ready to do it myself now. SJK (talk) 05:00, 11 March 2017 (UTC)
Catholic Encyclopedia 1913 encoding issue
If you look at this catalog, some of the catalog entries show signs of a bad character encoding / mojibake issue. For example, "Bartholomeu Lourenço de Gusmí£o" which links to Bartholomeu Lourenço de Gusmão when it should be "Bartholomeu Lourenço de Gusmão" and it should link to Bartholomeu Lourenço de Gusmão. Can this be fixed? SJK (talk) 13:13, 10 March 2017 (UTC)
How to back-import properties into mix'n'match
I've created or linked various items to VICNAMES Place ID (P3472) outside of mix'n'match, mainly by using QuickStatements. Is it a way to back-import the matches done outside of mix'n'match back into mix'n'match? SJK (talk) 00:02, 12 March 2017 (UTC)
- I worked it out myself. Click the "Sync" link and then there is a button on that screen to do it. SJK (talk) 04:37, 12 March 2017 (UTC)
"Accept" control missing?
Hi, I'm relatively new to using Mix n Match. I recently setup the Parks and Gardens UK list. I've just looked at it yesterday and the tool has changed, but now I cannot see the "Accept" link in the automatically matched list (or in other lists). I can only see the "Remove" link. Any suggestions what's going on are appreciated! Pauljmackay (talk) 08:05, 17 March 2017 (UTC)
Bug
The linked user names below the "Users" heading on on https://tools.wmflabs.org/mix-n-match/#/catalog/403 are broken.
The markup is:
<tbody><tr u="[object Object]"><td><a href="//Auxiliary data matcher">Auxiliary data matcher</a></td> <td class="num">44829</td></tr><tr u="[object Object]"><td><a href="//Pigsonthewing">Pigsonthewing</a></td> <td class="num">1640</td></tr><tr u="[object Object]"><td><a href="//MistressData">MistressData</a></td> <td class="num">9</td></tr><tr u="[object Object]"><td><a href="//Magnus Manske">Magnus Manske</a></td> <td class="num">1</td></tr></tbody>
including, for example:
<a href="//Pigsonthewing">
-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:52, 30 March 2017 (UTC)
Distinguishing the type of provenance of mappings when querying edit history logs
Hi Magnus, Is there a way to distinguish (via tags or comments)
- items matched automatically by Mix'n'match
- items matched manually from scratch and
- items matched manually after running Mix'n'match
when one queries Wikidata's edit history? If not, is there any data source that can be queried programmatically to get this information? Thanks a lot in advance. --Criscod (talk)
- Manual matches should always be done by the matching user (unless there was some issue at the time, in which case a bot will perform the match on Wikidata later). Mix'n'match "automatic matches" are not pushed to Wikidata, unless a user manually confirms one, which is then done under the respective user name. There are "thorough automatic matches", usually done when a catalog is first created; they rely on name, birth date, etc. and are attributed to a bot. --Magnus Manske (talk) 08:11, 15 May 2017 (UTC)
Please delete two catalogs
Hi Magnus, can you please delete town catalogs? 306 is replaced by 440 and 441 is a duplicate I created by error. Thanks. --Fralambert (talk) 00:00, 15 May 2017 (UTC)
- I have deactivated 306 and 441. --Magnus Manske (talk) 08:08, 15 May 2017 (UTC)
Please delete 443
I made some mistakes in the first catalog I created, and found it easier to create a second one, so catalog 443 can be deleted as it's basically the same as 450. Jon Harald Søby (WMNO) (talk) 09:13, 23 May 2017 (UTC)
- 443 has been deactivated. --Magnus Manske (talk) 09:25, 23 May 2017 (UTC)
Sorting by completeness
Is it possible to have this sorting done by the percentage of unmatched entries in each catalog, as opposed to just moving all completed catalogs to the bottom of the page and leaving the rest not in any particular order? Mahir256 (talk) 19:58, 5 June 2017 (UTC)
- The "completeness" sort takes into account the total number of unmatched and auto-matched entries. Sorting by "percent unmatched" seems a little pointless if a catalog with 200 entries has 50% (=100 entries) unmatched, but is sorted below another with 100000 total, 20% unmatched (=20000 entries). The top catalog in the "completeness" sort is easiest to finish. I can add a "percent unmatched" sort if you really like, but what would be the point? --Magnus Manske (talk) 13:03, 6 June 2017 (UTC)
- Okay, thanks for clarifying the sort order. I suppose I was thinking of something aesthetically pleasing, but I suppose adding a (not unmatched / total) indicator would make the interface similar to s:Special:IndexPages. Mahir256 (talk) 04:54, 13 June 2017 (UTC)
Please delete 493 and 494
Hello Magnus, i managed to import a CSV for the BNA author ID (P3788) but i made some mistakes (first time i messed dates with OpenRefine, second i saved the CSV with "). Catalogs 493 and 494 can be deleted, the good one is 495. Sorry for the mess! Thanks! --Mauricio V. Genta (talk) 07:34, 2 July 2017 (UTC)
- 493 and 494 have been deactivated. --Magnus Manske (talk) 21:44, 1 November 2017 (UTC)
Mix'n'match truncates geo coordinates
Not sure this is the right place for a bug report but... this sudden burst of incorrect "coordinate location" (P625) in WD has its source in an improper import of these coordinates upon creation of items using catalog 440, BNLQ (tracked thanks to User:Fralambert).
I reproduced it here: the imported latitude gets its initial digit truncated: coord (-47.215000, -71.861944) gets imported as (-7.215000, -71.861944)! See the same issue on Q30557727 and about a hundred more items that required manual correction.
Let me know if I need to report this elsewhere. Thanks --Laddo (talk) 01:44, 5 July 2017 (UTC)
- Thanks for letting me know. I have fixed the coordinates in Mix'n'match, looks much better now! --Magnus Manske (talk) 08:07, 5 July 2017 (UTC)
Link in username
Small bug: please change the links on the catalog description page, like https://tools.wmflabs.org/mix-n-match/#/catalog/503. "Framawiki" should point to my userpage on wikidata, not https://framawiki. --Framawiki (talk) 12:39, 14 July 2017 (UTC)
- And link for "automatic" should not be present https://tools.wmflabs.org/mix-n-match/#/entry/7433710 --Framawiki (talk) 12:45, 14 July 2017 (UTC)
- Thanks, fixed! --Magnus Manske (talk) 13:43, 14 July 2017 (UTC)
- Thanks :) --Framawiki (talk) 23:34, 15 July 2017 (UTC)
- Thanks, fixed! --Magnus Manske (talk) 13:43, 14 July 2017 (UTC)
UK National Archives ID
The National Archives ID property (P3029) is already used on some items on Wikidata, but I would like to start matching these more systematically. They have a straightforward API, and I wonder whether it might be possible to extract more of the data from their details of record creators: for example, for any monastic houses (cf. Cirencester Abbey, Ramsey Abbey), the religious order (P611) could be automatically extracted, but I am not sure how to go about doing this. AndrewNJ (talk)
- It's not entirely clear what data to use; the property seems to be limited to people, not including buildings. I have imported the people here, running name/date-based matching now. --Magnus Manske (talk) 21:43, 1 November 2017 (UTC)
TGN URLs?
Could the URL which entries in catalogue 49 (TGN IDs) use be changed to "http://www.getty.edu/vow/TGNFullDisplay?find=&place=&nation=&subjectid=$1"? This URL, as opposed to the one ("http://vocab.getty.edu/tgn/$1") which is used at the moment, actually displays synonyms for the given place name (indicating its level of administrative division) and the appropriate position in the geographic hierarchy of the place. To find these with the existing URL requires two more clicks to get to the URL I'm suggesting, which slows down identification of the place. @Magnus Manske: Mahir256 (talk) 05:39, 22 September 2017 (UTC)
- I seem to have done this recently, though I don't remember... let me know if it still needs fixing! --Magnus Manske (talk) 15:57, 1 November 2017 (UTC)
Entries missing from Statues Vanderkrogt
Several entries on vanderkrogt.net seem to be missing from Mix'n'match, e.g. this bust of Anne Frank. (I've already added the Vanderkrogt.net Statues ID property manually to Bust of Anne Frank (Q41688543).) This seems to be the case with several other entries in the Greater London section of the website. Ham II (sgwrs / talk) 07:40, 11 October 2017 (UTC)
The Strong Museum of Play collection
Can you make some categories based on the collections from http://www.museumofplay.org/collections, if you can? I don't know what property they would correspond to?--MisstressD (talk) 02:28, 29 October 2017 (UTC)
Wikidata property for Women's Manuscripts
Hi, I uploaded this women's manuscripts collection, but I didn't include a Wikidata property. I've since made this one for it. Could you please add it for me? Thank you Rachel Helps (BYU) (talk) 21:10, 20 November 2017 (UTC)
- Ah, you linked to an item, not a property? A property (like this one) needs proposal, community discussion, and consensus on creation. --Magnus Manske (talk) 23:14, 20 November 2017 (UTC)
- Oops. Thanks for being patient while I figure this out. I'm not sure if the collection is "notable" enough to be a property, since it would only apply to about 20 items. Rachel Helps (BYU) (talk) 17:50, 22 November 2017 (UTC)
The Smithsonian Museum Collections ID
Items from this page, featuring objects from all the Smithsonian museums http://collections.si.edu/search/ I can't scrape them. --MisstressD (talk) 01:40, 2 December 2017 (UTC)
- Running now for this catalog, but "only" the 2.8M entries with picture/video. 13M is too many; 2.8M already is... --Magnus Manske (talk) 10:05, 6 December 2017 (UTC)
Deleting catalog 388
Hi!
We want to migrate FundRef ids from DOIs to their own property wikidata:Property:P3153, so it would be great if catalog 388 could be deleted as it uses DOIs. I will migrate the existing claims to the new property once this is done.
Thanks a lot! − Pintoch (talk) 18:55, 11 December 2017 (UTC)
- I have deactivated 388. --Magnus Manske (talk) 10:36, 12 December 2017 (UTC)
Duplicates?
Isn't https://tools.wmflabs.org/mix-n-match/#/catalog/95 and https://tools.wmflabs.org/mix-n-match/#/catalog/794 about the same catalogue? --Anvilaquarius (talk) 10:16, 2 January 2018 (UTC)
- You are correct, thanks! I have deactivated #794, and set its autoscraper to update #95 instead. --Magnus Manske (talk) 21:09, 2 January 2018 (UTC)
Issues with Unicode?
@Magnus Manske: It appears that User:Александр Сигачёв's name appears as question marks in recent changes (yielding an equally incorrect URL for his user page in the process). Also, in the TGN catalog, non-ASCII code points get translated into escape sequences ("Santarém" becomes "Santar\u00E9m").
Can these issues with non-ASCII code points be fixed somehow? Mahir256 (talk) 17:37, 24 January 2018 (UTC)
- Fixed the TNG catalog. Will look into the username, different issue apparently. --Magnus Manske (talk) 15:19, 25 January 2018 (UTC)
- Thank you, @Magnus Manske:, and happy you day! Mahir256 (talk) 06:23, 26 January 2018 (UTC)
Library of Wales duplicates
I find that the Library of Wales catalog has a large number of duplicate entries. For example, I came across [10], [11], [12] and [13], which are all about the same person. And that is not an extreme case at all, there are several with even more links. Could these extraneous links be removed, or at least have an 'n/a' automatic matching? (in the case mentioned here, the first link is the correct one) - Andre Engels (talk) 10:22, 8 February 2018 (UTC)
- I have manually culled a lot of them into N/A status. Might have missed some, or mis-identified others. No good single pattern to the "bad" ones. --Magnus Manske (talk) 13:35, 8 February 2018 (UTC)
Please delete 1009
Matching with "lastname, surnames" seems not working properly. Please delete catalog 1009.
- 1009 deactivated. --Magnus Manske (talk) 13:24, 23 February 2018 (UTC)
What is the right tool to find entities in wikidata
I'd like to match persons only by their names and wonder if Mix'n'Match is the right tool for this purpose?
- Mix'n'match would certainly work for that, especially if the list is static (doesn't change), or can be auto-updated from a website. Other options are OpenReconcile and a Google Spreadsheet plug-in; see here for what the Bodleian Library (Oxford) have looked at. --Magnus Manske (talk) 18:55, 23 February 2018 (UTC)
Enhanced download options
Hi there, I've just started playing with Mix'n'match and have a few queries. Firstly, the download link in dropdown action menu only downloads the matched items, not the unmatched items. I've been looking at this catalog, which has 135 pages of unmatched items. Most items are not notable and will never have a wikipedia page. But there are over 750 wikipedia articles that should be matched within those 6700 unmatched items. If I could download the full list, I could probably fairly quickly match quite a few, with some data manipulation in Excel, but I'm not going to download, manually review or game mode 135 pages of data. Can the download button please either download all items, and have a field with it's Mix'n'Match status, or have it default to downloading whichever subset (manually, auto, unmatched, n/a) page you are viewing at the time.
Also, when you are in game mode, the action button isn't shown to switch back to another mode. The-Pope (talk) 16:23, 21 February 2018 (UTC)
Translation; Mix'n'match/6/en needs fixup
I wish to put Mix'n'match/6/en or "w:en:Manual for small and new Wikipedias" as w:ja:すべての言語版にあるべき項目の一覧 to link to jawiki page. Presently, it will output errorneously. Can you have a look and correct please? --Omotecho (talk) 17:43, 25 February 2018 (UTC)
- Hi, I'm not sure what you mean. All the links you give are broken. There is a Japanese page here, if that's what you mean? What's the problem with it? (note: I don't speak Japanese) --Magnus Manske (talk) 08:49, 26 February 2018 (UTC)
- Hello, I mean, two pionts were mixed up. When I tried to replace English page link with the following links in translation window, it did not work, thus I thought I would ask for your help.
- Talking about w:en:Manual for small and new Wikipedias, the link you suggested is exactly which I wished to replace the en link.
- for the “See also” section, it will be handy to give link to Japanese page for List of articles every Wikipedia should have (ja).
- Now, out of curiosity, how do you use $-parameters and Special/MyLanguage on what rules? May I replace links in English source pages with either of them, so that translators just translate the words after the pipe or
|
, and the link will be of local language (if there is translated page)? That way, translating a page will save much time, or at the moment I am going back and forth tab to tab checking if we have ja pages to English page links in the paragraph. Especially with Help pages, I rather find it disapointing to jumping to English pages, even if the link is written in Japanese. (See the case with “List of articles every Wikipedia should have.”) Cheers, --Omotecho (talk) 19:04, 26 February 2018 (UTC) edited.Omotecho (talk) 18:48, 26 February 2018 (UTC)w
Deutsche Biographie scraper help
Hi All, I was trying to make an automated scraper on Mix'n'Match for an important database called the Deutsche Biographie (https://www.deutsche-biographie.de) but I wasn't able to (not much of a professional). It seems easy though, as the links seem simple (https://www.deutsche-biographie.de/sfz70756.html) Could anyone help? Also this might help: http://data.deutsche-biographie.de/beta/solr-open/ http://data.deutsche-biographie.de/beta/sparql/ Thanks, Adam Harangozó (talk) 11:17, 2 March 2018 (UTC)
- Made a scraper, running now, will be here. I'll have to pull descriptions separately. Any Wikidata property for this yet? --Magnus Manske (talk) 11:37, 2 March 2018 (UTC)
- Update: 2014 is NDB, 1042 is ADB. --Magnus Manske (talk) 11:49, 2 March 2018 (UTC)
- Amazing, thanks! No wikidata property, only Q1202222. NDB and ADB are combined on this site, so they are separate things on their own. --Adam Harangozó (talk) 12:14, 2 March 2018 (UTC)
- So I think it should be one database (Deutsche Biographie), NDB and ADB can just be reached as scanned books through these profiles (http://daten.digitale-sammlungen.de/0001/bsb00016233/images/index.html?fip=193.174.98.30&id=00016233&seite=372). --Adam Harangozó (talk) 14:31, 2 March 2018 (UTC)
- I can only have one scraper per catalog. I can merge them, but need to deactivate one scraper. Are both ADB and NDB complete, or do they add entries over time? --Magnus Manske (talk) 14:45, 2 March 2018 (UTC)
- I don't know but for me it seems that we need a single scarper looking at all the identifiers starting with sfz, as it does not matter if they are in both ADB and NDB or only in one. For example: "tools.wmflabs.org/mix-n-match/?#/search/Rudolfus Collinus" They still have the same identifier, which I think should simply be called Deutsche Biographie. (So ADB and NDB are just sub-categories here, no need to worry about them) --Adam Harangozó (talk) 13:08, 3 March 2018 (UTC)
- I can only have one scraper per catalog. I can merge them, but need to deactivate one scraper. Are both ADB and NDB complete, or do they add entries over time? --Magnus Manske (talk) 14:45, 2 March 2018 (UTC)
User:Adam Harangozó, User:Magnus Manske - the official recommendation by the Deutsche Biographie is to use the GND, not the SFZ.
- http://www.historische-kommission-muenchen-editionen.de/beacon_adb.txt
- http://www.historische-kommission-muenchen-editionen.de/beacon_ndb.txt
- http://www.historische-kommission-muenchen-editionen.de/beacon_db_register.txt
Please create a catalog based on GND. 78.51.217.6 21:26, 28 July 2018 (UTC)
- Now here. Still loading, as it needs to check each URL for the details, which are not in the BEACON files. --Magnus Manske (talk) 07:59, 30 July 2018 (UTC)
"Improved search?" part 2
Apologies if you happened to miss my ping about the subject among the rest of the above subsections, but a year ago ArthurPSmith asked about possible improvements to the search feature, namely filtering by match status and searching through descriptions. Do you have any updates about those? Mahir256 (talk) 17:52, 2 March 2018 (UTC)
- @Magnus Manske: I am genuinely curious as to whether these are being worked on, and I'm sure Arthur probably still is as well. My apologies if the pings are getting annoying. Mahir256 (talk) 02:41, 6 June 2018 (UTC)
- I am not working on those, and I am not considering them a priority. --Magnus Manske (talk) 10:14, 14 June 2018 (UTC)
Update a non-scraper catalogue?
Hi,
I could swear I had read a documentation about that, but cannot find it anymore :-(
How would I go about updating a CSV-backed Mix’n’match catalogue? I had made a mistake when scraping 989 (all games are set to Mega Drive). I re-scraped the website and have a CSV ready − what would be the process to update the catalogue?
Thanks! Jean-Fred (talk) 12:31, 4 March 2018 (UTC)
- Can you just import it and I'll close the old catalog? --Magnus Manske (talk) 10:47, 19 March 2018 (UTC)
Retrieve the config of a scraper-backed catalogue?
I have loaded several catalogues using the Scraper tool − works fine :)
In several cases, I notice a discrepancy with Wikidata, which clearly hints at a mistake I made when scraping (eg in User:Magnus_Manske/Mix%27n%27match_report/789, I must have done something wrong with the 'Z').
I would thus be keen on fixing my scraper ; however I have not backed up the config (especially the regexes) I used ; and they were such a pain to craft in the first place, that I really hope I don’t have to start from scratch again :)
Is there any way to look up the underlying config?
Jean-Fred (talk) 12:34, 4 March 2018 (UTC)
Creation candidates comes out empty
I used to work with Creation candidates a lot, but now it doesn't work at all. For example https://tools.wmflabs.org/mix-n-match/#/creation_candidates/human gives "No results, parameters might be too restrictive" - https://tools.wmflabs.org/mix-n-match/#/creation_candidates hasn't been working for a longer time, trying to get the empty name, then grinding the whole browser to a halt. - Andre Engels (talk) 07:03, 2 June 2018 (UTC)
- /human works again - Andre Engels (talk) 09:30, 4 June 2018 (UTC)
- No, still not working correctly - I now get the same entry over and over again, there does not seem to be anything else available. - Andre Engels (talk) 12:01, 4 June 2018 (UTC)
- I should probably turn that off. I have a bot creating these automatically now... --Magnus Manske (talk) 15:06, 5 June 2018 (UTC)
- No, still not working correctly - I now get the same entry over and over again, there does not seem to be anything else available. - Andre Engels (talk) 12:01, 4 June 2018 (UTC)
Multilingual matching and value addition
I regularly add multilingual CC-0 dataset for matching (latest being the EUROVOC). Multilingual matching and multilingual label addition would be a boon for those cases. --Teolemon (talk) 15:14, 4 July 2018 (UTC)
Problem to create item with Visual tool
Hi,
Pratically every time I try to create an item with the Visual tool, I got this message "Wikidata error: Must be no more than 250 characters long". Do it is possible to fix this ? Simon Villeneuve 11:09, 30 July 2018 (UTC)
Excracting country from description
I know there's a script especially made to extract birth and death dates, but is it possible to extract P27 from description field in catalog #1538?--HeavyTony (talk) 01:41, 3 August 2018 (UTC)
- Can do, but are you sure that "Country of origin: X" is P27 and not "country of origin (P495)"? --Magnus Manske (talk) 08:17, 3 August 2018 (UTC)
- I'm generating P27 ones now. --Magnus Manske (talk) 08:36, 3 August 2018 (UTC)
- That's perfect.--HeavyTony (talk) 11:49, 3 August 2018 (UTC)
Adding property P434 to catalog 1486
So, there's more than 57 000 entries, but "This catalog has no Wikidata property!". --HeavyTony (talk) 04:24, 4 August 2018 (UTC)
- Had a look, looks like 1486 was fixed in the meantime :) Jean-Fred (talk) 18:34, 2 October 2018 (UTC)
Search problems
For the last few days I have been unable to have a search complete at my home and work. Is there something up? William Graham (talk) 22:13, 19 August 2018 (UTC)
- Looks like it's resolved. William Graham (talk) 18:16, 20 August 2018 (UTC)
Hi, I have the issue that the wikidata search in the visual tool only works sometimes. Some example queries that don't work and just show "Searching...": Chang and Eng ; John Kasich ; L H Myers ; Haakon II Sigurdsson ; Thomas Hartman FS100 (talk) 12:18, 19 September 2018 (UTC)
Critical Condition Film Ids
The films featured on these lists http://www.critcononline.com/alphabetical%20list%20of%20movies.htm--MisstressD (talk) 22:24, 1 October 2018 (UTC)
Modify scraper after saving
@Magnus Manske: Is it possible to edit (or at least view) the scraper after having saved it? --Malore (talk) 00:54, 3 October 2018 (UTC)
Please deactivate catalogs 1525 & 1526
Could you also rename 1527 to have 1525 title? thanks--HeavyTony (talk) 15:32, 5 December 2018 (UTC)
Adding ISSN and BFI level to catalog 2028
@Magnus Manske: Could you extract ISSN and BFI from description in catalog 2028?--HeavyTony (talk) 16:02, 15 December 2018 (UTC)
Replacing a catalog
@Magnus Manske: I created the original PCGamingWiki catalog (https://tools.wmflabs.org/mix-n-match/#/catalog/2159) about a month ago, but it had some encoding issues that were causing problems. I've now uploaded a "better" version of the catalog here: https://tools.wmflabs.org/mix-n-match/#/catalog/2196.
Can you delete the old catalog? Thanks for this awesome tool :) Nicereddy (talk)
- I marked the old catalogue as inactive :)
- @Magnus Manske: On both the old catalogue and the new one that Nicereddy imported, Mix’n’match made Zero automatic matches, which is weird − before Nicereddy went on mass-linking with a script, there were hundreds of easy matches that mix’n’match would have typically picked up. Even now, I think there should be some more potential matches − eg this to Agricultural Simulator 2012 (Q11849886).
- Jean-Fred (talk) 21:37, 7 February 2019 (UTC)
- Actuallyn Nicereddy clarified to me that there were auto-matches on that second catalogue (he was just too quick to either accept the good ones and reject the bad ones before I could notice ^_^). Jean-Fred (talk) 09:14, 8 February 2019 (UTC)
Please deactivate catalog 2217 and 2220
@Magnus Manske: Please deactivate catalog 2217 and 2220. IDs of catalog 2217 are not decoded. Property of catalog 2220 is wrong.--本日晴天 (talk) 11:00, 5 March 2019 (UTC)
Please delete catalog 2362
@Magnus Manske: I created that NGMDb catalog in error. Can you please remove it so I can use mix'n'match without conflating that catalog with NGMDb ID? Thank you, Trilotat (talk) 18:42, 8 May 2019 (UTC)
- Done Disabled. Jean-Fred (talk) 21:58, 10 May 2019 (UTC)
When item matches multiple WikiData entries
How should one handle an entry that should map to multiple entries in WikiData? (Perhaps a future enhancement to the tool.) An example, set is CathEn 1913, entry 25390652, catalog id 12344b (article title Praxedes and Pudentiana) should match both Q268087 and Q676485. --Dcheney (talk) 06:35, 11 August 2019 (UTC)
Please delete catalog 2737
@Magnus Manske: I created the lokalhistoriewiki catalog with a lot of error. My first time creating a catalog, I will make a new catalog. I'm sorry for making a mess. - Premeditated (talk) 21:36, 21 August 2019 (UTC)
Please delete catalog 3325
@Magnus Manske, Pigsonthewing, Jean-Frédéric, Adam Harangozó, Harmonia Amanda, Thierry Caro, Ash Crow, Salgo60, and Gerwoman: (since last I heard you all are catalog admins--indeed there should be a nicer and shorter way of pinging you all): I set up a scraper for a website hoping that it would skip numeric identifiers that didn't yield matches (e.g. if "56" and "58" yielded matches but "57" didn't, I thought the scraper would simply skip "57" rather than terminate at "56"). Since that scraper evidently failed, I opted instead to just upload a bunch of IDs manually (which, as it turns out, can't be used to update existing catalogs per the current import form). I thus humbly request that catalog #3325 be marked inactive (until, perhaps, the behavior of the autoscraper in this case could be adjusted). Thank you! Mahir256 (talk) 03:13, 26 January 2020 (UTC)
- Done --Gerwoman (talk) 09:36, 26 January 2020 (UTC)
Please delete catalog 3345
@Magnus Manske, Pigsonthewing, Jean-Frédéric, Adam Harangozó, Harmonia Amanda, Thierry Caro, Ash Crow, Salgo60, and Gerwoman: Catalog 3345 is a duplicate of 3344. Please delete It. Thanks
Nikola Tulechki (talk) 11:03, 30 January 2020 (UTC)
- Done Marked as deactivated. Jean-Fred (talk) 11:07, 30 January 2020 (UTC)
Polish catalogues
I see, that several polish catalogues are not in group country_polska, how to add them to it? Matlin (talk) 22:46, 15 February 2020 (UTC)
Scraper problem?
So I set up a scraper for a mix'n'match catalog (#3407) that, when I tested it in the catalog creation screen, worked fine (it gave two well-formatted results), but somehow didn't capture anything when I actually created it (every time I try to run the job, it returns nothing). Is something wrong with either the scraper or the way I created the catalog? Did my use of a very long regex to identify matches or a list of 20,000 URLs to scrape cause a problem? Mahir256 (talk) 21:40, 17 February 2020 (UTC)
- @Magnus Manske: While I could try manually importing this catalog, I don't want to do so without knowing why the scraper I set up does not work, especially given the previous scraper I tried to set up (described above) also failed for (possibly different?) reasons. Mahir256 (talk) 22:34, 19 February 2020 (UTC)
- Looks like it worked after all? --Magnus Manske (talk) 09:25, 27 February 2020 (UTC)
- @Magnus Manske: Indeed it has; if it just took a long time (getting 22k pages is a lot), then it should be marked as "doing" in the jobs list whenever I check on it rather than just go straight to "done" after a few seconds. Mahir256 (talk) 04:44, 29 February 2020 (UTC)
- I think I am running into the same problem with (#3448) – the job status indicates that it has finished but there are no results. There is a large range to work through, so perhaps we'll see in another day or so? AndrewNJ (talk) 22:09, 20 March 2020 (UTC)
- @Magnus Manske: Indeed it has; if it just took a long time (getting 22k pages is a lot), then it should be marked as "doing" in the jobs list whenever I check on it rather than just go straight to "done" after a few seconds. Mahir256 (talk) 04:44, 29 February 2020 (UTC)
- Looks like it worked after all? --Magnus Manske (talk) 09:25, 27 February 2020 (UTC)
Disabled catalog inacessible
I deactivated (in the catalog_editor) 3411, as the underlying property needed to be auto-fixed. I would now want to re-enable it, however https://tools.wmflabs.org/mix-n-match/#/catalog_editor/3411 does not work anymore − the JS console throws `TypeError: "catalog is undefined"`
Jean-Fred (talk) 12:28, 21 February 2020 (UTC)
- I have reactivated the catalog. --Magnus Manske (talk) 09:23, 27 February 2020 (UTC)
Mix'n'Match: FragDenStaat.de
@Magnus Manske: Since the current data set is quite outdated I've created a webscaper for FragDenStaat.de. Could you please add it since you're the owner of the catalogue?
URL pattern: https://fragdenstaat.de/api/v1/publicbody/?format=json&limit=50&offset=$1 RegEx entry: {.+?"id":([0-9]+),"name":"([^"]+)","slug":"([0-9A-Za-z_]+(\-[0-9A-Za-z_]+)*)","other_names":"([^"]*)","description":"([^"]*)",.*?,"classification":(null|{.*?"name":"([^"]+)").*?"jurisdiction":{.*?"name":"([^"]*) id: $3 name: $2 desc: $9, TYPE: $8, ALT NAME: $5, DESCRIPTION: $6, ID: $1 url: https://fragdenstaat.de/behoerde/$3
Apart from that I've noticed a PHP warning which occured when clicking on "Test this Scraper". It lead to an invalid JSON reponse for the AJAX-request and was displayed as "unknown error" in the user interface. Unfortunately I don't remember my input.
<br /> <b>Warning</b>: preg_match_all(): Unknown modifier 'b' in <b>/data/project/mix-n-match/autoscrape.inc</b> on line <b>733</b><br /> <br /> <b>Warning</b>: preg_match_all(): Unknown modifier 'b' in <b>/data/project/mix-n-match/autoscrape.inc</b> on line <b>733</b>
Thank you very much (especially for your tools) and best whishes --Nw520 (talk) 00:50, 28 February 2020 (UTC)
- Found the issue with the unknown modifier. I had a RegExp with slashes which weren't probably encoded. Maybe it would better to define a custom error handler to prevent these warnings from being output and therefore breaking the JSON. --Nw520 (talk) 23:32, 25 March 2020 (UTC)
- Added an issue for that. --Nw520 (talk) 23:49, 25 March 2020 (UTC)
"Automatically matched" and "automatched" (compared to "manually matched")
Please see d:Topic:Vjres76a1f43dhs2 about a possible change of labels. Jura1 (talk) 22:19, 2 April 2020 (UTC)
- "Automatically matched" is now "Fully matched"
- "Manually matched" is now "Preliminarily matched"
Jura1 (talk) 12:53, 18 April 2020 (UTC)
where can i find info about catalogue changes ?
i am unable to find info about catalog changes. where can i find them ? for some reason, i am unable to find info about Rachel C. Thomson Rachel C. Thomson (Q58874674), i tried this search, perhaps i am wrong. btw, great tool. Leela52452 (talk) 06:34, 25 April 2020 (UTC)
Please deactivate catalog 2698
@Magnus Manske: Please deactivate catalog 2698. The identifiers are totally obsolete. See also d:Property talk:P3231#ID change. 本日晴天 (talk) 08:10, 27 April 2020 (UTC)
- Done.--Magnus Manske (talk) 08:40, 27 May 2020 (UTC)
catalog 2528 redirecting to dummy.org
hello @Magnus Manske: catalog 2528 is redirecting to dummy website. the website is for sale. Leela52452 (talk) 09:51, 28 April 2020 (UTC)
- Fixed. --Magnus Manske (talk) 08:44, 27 May 2020 (UTC)
'Remove' link is not working
The 'Remove' link (for removing preliminary, and in rarer cases also other matches) is not working. The item is grayed out, and remains grayed out without anything happening. In a perhaps related issue, the 'Create new item' button at the bottom of creation candidates pages has the same issue, and already had it longer (that was less of an issue because it could be circumvented by creating an item from one link, then add it to the others). - Andre Engels (talk) 08:24, 15 May 2020 (UTC)
- Should be fixed now. --Magnus Manske (talk) 08:41, 27 May 2020 (UTC)
WD descriptions
After the migration to Toolforge, the tool doesn't load the Wikidata descriptions anymore ("Could not load description for [Q-ID]"). Is it temporary behaviour? --INS Pirat (talk) 20:52, 4 June 2020 (UTC)
- Everything is working well now. Thanks. --INS Pirat (talk) 20:43, 8 June 2020 (UTC)
Polish catalogues
Please clean polish catalogues. There are these ones to add:
- Gry Online company ID
- IPSB
- National Museum in Warsaw artist
- PSB
- PAU Kraków
- Priests of the Archidiecezja Gdańska
- MusicBrainz bands PL
- Encyklopedia Fantastyki
- Encyklopedia Leśna
And to remove:
--Matlin (talk) 17:17, 5 June 2020 (UTC)
- @Matlin: What do you mean by cleaning?
- As far as I understand, the grouping per country is inferred from the property linked to the catalog − so you’d need to tag the properties accordingly.
- Jean-Fred (talk) 12:43, 16 June 2020 (UTC)
Visual tool not working?
I've tried to use the visual matching tool in both Firefox and Chrome and it is showing as blank. I also tried this with different catalogs, same results.
--Nashona (talk) 16:27, 11 June 2020 (UTC)
- I have been having the same problem for over 2 weeks and submitted a ticket to @Magnus Manske: here on June 3 but haven't seen any updates yet. --Infopetal (talk) 19:05, 16 June 2020 (UTC)
Usability issue
In Match mode, would it be possible to consider the replacement of [↑] with something easier (bigger) to click? I would suggest something like [Match!] or [Match it]. Thanks --Luckyz (talk) 06:31, 15 June 2020 (UTC)
Auto-generated descriptions use wrong pronouns for transgender people
- Pinging tool maintainers (as listed on Toolforge): User:MaxFrax96, User:Magnus Manske, User:Hjfocs
When Mix'n'match suggests possible Wikidata items which correspond to a given external item, it automatically generates a description for that item. When browsing preliminary matches for Politifact IDs, I found that it generated this description for Christine Hallquist (Q56167585): "Christine Hallquist is a US-American politician. He was born on April 11, 1956 in Baldwinsville. He studied at Mohawk Valley Community College." Hallquist is a transgender woman, and referring to her as "he" is likely to cause offense.
In Wikidata, her gender is set to trans woman (Q1052281), so this seems to be a code error and not an issue with the data. Whatever code generates these descriptions needs to be corrected to use feminine pronouns here. I'm guessing that what might have happened here is that the code didn't recognize trans woman (Q1052281), and fell back to he/him/his pronouns as a default. If this is the case, it would probably be a good idea to change the fallback pronouns to they/them/their to avoid causing unintended offense in the future. –IagoQnsi (talk) 17:59, 21 July 2020 (UTC)
- Ah, just noticed there's Bitbucket issue reporting for this project; I've just opened issue #63 for this bug. –IagoQnsi (talk) 18:03, 21 July 2020 (UTC)
Please delete catalog 3880
@Magnus Manske, Pigsonthewing, Jean-Frédéric, Adam Harangozó, Harmonia Amanda, Thierry Caro, Ash Crow, Salgo60, and Gerwoman:
Please delete my catalog 3880 (or delete data). Sorry for my mistake in IDs. --Manu1400 (talk) 15:08, 6 October 2020 (UTC)
- Done. Jean-Fred (talk) 17:35, 6 October 2020 (UTC)
- I dont have those privs but I would like to reload catalog/1223 as it contains items with Show False that should not be in the catalogue see API call . Anyone who knows how to do that- Salgo60 (talk) 15:25, 6 October 2020 (UTC)
Logs & Update Regex
I don't think the catalog I created, catalog 3986, is working correctly. I can't seem to find any logs. Even if there were logs, I don't see any way to update the regex? Thanks for your help! U+1F360 (talk) 15:52, 28 November 2020 (UTC)
Scraper functionality down?
When I try to create a new scraper from https://mix-n-match.toolforge.org/#/scraper/new , no matter what URL (including http://example.com or http://python.org ) I put in the URL pattern field, when I test it, it reports empty HTML back. Is this happening to anyone else? JesseW (talk) 03:36, 19 March 2021 (UTC)
- Now filed a issue in the repo: https://bitbucket.org/magnusmanske/mixnmatch/issues/67/curl-isnt-working-on-toolforgeorg , including explicit descriptions of how to fix it. Hopefully it will attract attention soon. JesseW (talk) 01:53, 20 March 2021 (UTC)
Delete some entries from catalog?
I've added several new entries to an existing catalog with the wrong IDs by mistake. I have flagged them (90 entries) as "Not applicable to Wikidata". Is it possible to remove them from the catalogue? https://mix-n-match.toolforge.org/#/catalog/3476 . Solidest (talk) 21:55, 27 April 2021 (UTC)
Could not load description for Qxx
@Magnus Manske: it seems changes made on mix-n-match are not reflecting on wikidata. for example: i have added http://www.artnet.com/artists/galina-smirnova-2 on Q4424803. please look into the issue.Gi vi an (talk) 11:03, 23 May 2021 (UTC)
GBIF & GBIF taxon ID (P846)
Hello @Magnus Manske! Could you please make Mix'n'match add GBIF taxon ID (P846) for GBIF IDs instead of using described at URL (P973)? Thanks. Tol (talk | contribs) @ 02:48, 25 October 2021 (UTC)
- Done, syncing now. --Magnus Manske (talk) 08:12, 25 October 2021 (UTC)
What's with the random mode?
I like that mode e. g. https://mix-n-match.toolforge.org/#/random/4156 - but it takes many minutes after each matching for the next one to load. This makes it unusable, sadly. --Anvilaquarius (talk) 20:25, 11 January 2022 (UTC)
Please remove catalog #5011
Hi @Magnus. Could I ask you to please remove catalog #5011? There was an issue with diacritics in the data that I didn't catch before upload. I've fixed the issue but would like this catalog removed so I can create a new one with the revised data. Apologies for the error. Thanks! --Ostapt (talk) 13:32, 18 January 2022 (UTC)
Mix'n'match is not working properly
Am I the only one who facing problems with Mix'n'match? Tool is not working properly for more than a two weeks now. Last week I ran into the fact that I couldn't even open the Mix'n'match tool. Now the same thing (it endlessly loading, see screenshot) + it looks like it's no longer possible to create any new item in Creation candidates. The same thing happens in search section. Can someone please explain what's going on? Is there any maintenance work in progress or is the tool that facing some serious issues? Regards Kirilloparma (talk) 04:40, 16 February 2022 (UTC)
- Same here. The tool has been off and on again for some time, but now it seems off for good. Item creation was ridiculously slow (but did work) last week for me. --Anvilaquarius (talk) 12:42, 16 February 2022 (UTC)
- Creation candidates are stuck for several weeks now (screenshot) and there is no optimal solution on how to create a new item corresponding to available catalog. Let's ping @Magnus Manske to clarify the situation. Regards Kirilloparma (talk) 18:23, 22 February 2022 (UTC)
- I think creation candidates were fixed week or two ago. At least for my own catalogues. Solidest (talk) 17:03, 15 March 2022 (UTC)
- Creation candidates are stuck for several weeks now (screenshot) and there is no optimal solution on how to create a new item corresponding to available catalog. Let's ping @Magnus Manske to clarify the situation. Regards Kirilloparma (talk) 18:23, 22 February 2022 (UTC)
Catalog 4992 - P1266 could be added as auxiliary
P1266 and P10266 share the same values. Those both site belongs to the same company and share the same database. The main difference is the language and the main country. --QTHCCAN (talk) 16:30, 26 April 2022 (UTC)
Auxiliary data for catalog #2422
The description fields contain the test result (left) and the imdb id (right). So, we could have converted it to P5021:Q4165246 P9259:fail or pass and obviously P345 --QTHCCAN (talk) 17:43, 10 June 2022 (UTC)
Databáze her ID property
There is a Databáze her ID property - d:Property:P10096 that could be added to the Catalog 4879 and then the matches could be transferred — Pius (talk) 10:54, 30 August 2022 (UTC)
- The catalogue and the property are not using the same ID scheme. Jean-Fred (talk) 13:18, 30 August 2022 (UTC)
- @Pius: Done m'n'm catalog for numeric IDs. Regards Kirilloparma (talk) 01:29, 16 October 2022 (UTC)
Catalogs not adding entries
Is there a reason why the newest 6 categories aren't having their entries added to them? It shows 0 for everything. Wd-Ryan (talk) 02:12, 11 October 2022 (UTC)
- Seems to be fixed now. Wd-Ryan (talk) 21:54, 11 October 2022 (UTC)
Matches on Mix'n'Match not added to Wikidata
I've run into an issue on several occasions where an identifier appears to have been already matched and attached to its corresponding Wikidata item (e.g. Auto-matched via auxiliary data), but the identifier does not appear on the item. I've seen this for biographical items (humans) as well as taxonomic. For example, a search for "Graemeloweus" on Mix'n'Match indicates that BugGuide taxon ID (P2464) and iNaturalist taxon ID (P3151) have already been matched and presumably attached to Graemeloweus (Q98557315) (the only option is "remove"). Yet said identifiers currently do not appear on Q98557315, nor is there any evidence they were ever added. Is there a "limbo" waiting area where properties have been auto-matched but not added (and if so, when and how does the logjam clear), or is this a bigger glitch? Animalparty (talk) 22:34, 22 October 2022 (UTC)
How to delete catalog
How can I delete a catalog? I uploaded a sample of a catalog to check if it worked and can't find out how to remove it. https://mix-n-match.toolforge.org/#/catalog/5586 Likevel (talk) 12:37, 17 November 2022 (UTC)
Usernames on catalogs
An altered version of my username is showing up on catalogs I created. Here's one of them. I'm not sure if this is a glitch, or a prank, or what, but I would like it to be fixed or removed. The Honorable (talk) 03:39, 23 December 2022 (UTC)
Does mix'n'match track which preliminary matches are not matches?
@Magnus Manske That would be valuable data to have for training better matching models. Lectrician1 (talk) 17:14, 12 January 2023 (UTC)
Catalog 2699
@Magnus Manske: Please connect mixnmatch:2699 to J-STAGE journal ID (P11504). 本日晴天 (talk) 11:49, 17 January 2023 (UTC)
- @本日晴天: Done (Mix’n’match admins can do that, not only Magnus). Also ran a manual sync, some ~1000 matches made on Wikidata are now reflected on Mix’n’match. There are some things that might need fixing, see sync/2699 Jean-Fred (talk) 09:06, 19 January 2023 (UTC)
Unable to add Mix-n-Match data
I am trying to add data from https://docs.google.com/spreadsheets/d/e/2PACX-1vRrqNBzhvmK_yX6E4DMID5PVZmT_W82-mELkhVbCp5VKsSbpqMdgELBV62J0la5cCvqRQ1QioRnd5pV/pub?output=csv to https://mix-n-match.toolforge.org/#/catalog/5714. However, every time I click import it says the data was imported but there's nothing in the catalog. RPI2026F1 (talk) 16:30, 5 January 2023 (UTC)
- Long time ago, but it seems that importing data is not instant and I need to wait a half day or so. RPI2026F1 (talk) 01:51, 7 February 2023 (UTC)
Random matching broken
Whenever I try to visit the random matching page (https://mix-n-match.toolforge.org/#/random/4771), I get this error:
vue.min.js:formatted:405 ReferenceError: get_catalog is not defined at a.eval (eval at Tn (vue.min.js:formatted:2687:20), <anonymous>:3:86) at t._render (vue.min.js:formatted:3423:23) at a.r (vue.min.js:formatted:4202:29) at Kr.get (vue.min.js:formatted:3025:29) at Kr.run (vue.min.js:formatted:3079:26) at ht (vue.min.js:formatted:622:15) at Array.<anonymous> (vue.min.js:formatted:419:23) at J (vue.min.js:formatted:412:17)
RPI2026F1 (talk) 01:50, 7 February 2023 (UTC)
Improve the explanation of how to import auxiliary data
I noticed there is a way to match using auxiliary data but I am not sure how to add it correctly. So far I have tried adding a new column in my CSV but I don't think it worked, and it's not really clear how I should go about doing it. RPI2026F1 (talk) 01:56, 7 February 2023 (UTC)
Auto-update for redirects
Sometimes, when I try to sync in changes from Wikidata to mix'n'match, it will complain about a mismatch in items (where Wikidata says item x and Mix'n'match says item y), however most cases item y has been merged with and is a redirect to item x. Mix'n'match should either periodically resolve items to get rid of duplicates, or do it upon attempting to sync with Wikidata. RPI2026F1 (talk) 15:46, 8 February 2023 (UTC)
Bugged catalog
https://mix-n-match.toolforge.org/#/catalog/5758
This catalog isn't adding any of the IDs when I import from a file. I've tried both CSV and TSV, but all it does is run the job for a while and mark it as "DONE" even though nothing was added. Everything looks fine when I'm going to import them. There's about 20,000 entries. Wd-Ryan (talk) 18:16, 21 February 2023 (UTC)
Change default instance of for imported items?
I have imported data for catalog https://mix-n-match.toolforge.org/#/catalog/5714 and these represent humans. However the default data type is blank. How do I set it to be "human"? RPI2026F1 (talk) 03:39, 6 March 2023 (UTC)
Restrict automatches to specific P31?
Is there a way I can restrict automatch so it only matches items that has a specific value for instance of (P31)? RPI2026F1 (talk) 00:19, 30 March 2023 (UTC)
Service down
All API calls are returning a 503. Not sure how to notify the tool creator. RPI2026F1 (talk) 18:56, 7 April 2023 (UTC)
- Also in the case of a 503 the frontend on the homepage will infinitely repeat the failing request until it works. RPI2026F1 (talk) 18:57, 7 April 2023 (UTC)
- @RPI2026F1: I've notified the creator about this issue on his Wikidata talk page. ミラP@Miraclepine 23:04, 7 April 2023 (UTC)
problem on creating catalog #5965
I have tried to create a new catalog. Although the importing job has finished, no entries appear and it seems empty when I request the data download. I guess the dataset I prepared has something wrong, but no clues. Can anyone analyze the situation? Mzaki (talk) 12:35, 11 June 2023 (UTC)
internal state broken
Mix-n-match does not update its internal database anymore. Any changes stay in a greyed out mode. However, you can still modify Wikidata via the website. Matthias (talk) 17:34, 16 July 2023 (UTC)
updating TMDB catalogues with additional IDs from id dumps
TMDB (Q20828898) now has a a daily dump of IDs for Movies, TV Series, People, Collections, TV Networks, Keywords, and Production Companies and offer a flag for which are adult content or not based on their flagging. I was able to contact Rohfle to update the TMDB company ID catalog, but haven't been successful with a response from @Gerwoman on TMDB tv id (1068), TMDB person id (1067), nor TMDB movie ID (1066) to update the IDs here from the id dump for matching. Attempting to upload the Movie ID TSV with IDs and movie title, I'm seeing the file upload, but then no preview nor option to add additional IDs to the existing catalogs. There are 3+ million entries, but it is unclear what is stopping the adding of IDs from the front end. Can someone check on what is stopping the update or if it is the size, let me know how to provide the file for the update processing? Scraping is prohibited, but these dumps provide a clean means of matching. See also the discussion on TMDB regarding matching to Wikidata. Wolfgang8741 (talk) 16:59, 9 August 2023 (UTC)
Missing AI
Is it possible to add some more AI to this feature, for the field of occupation badminton player I deleted around 100.000 suggestions, whilest accepting only around poor 30 items. So please, if the solely occupation is badminton player, delete erverything of the suggestions except Prabook and CWG. What would be needed is babelnet, flashscore, the-sportsorg, globalsportsarchive, Google knowledge. If there will be no answer within 2 weeks, I would suggest to put this tool on a blacklist, disable the tool or remove the tool. Florentyna (talk) 20:15, 13 September 2023 (UTC)
Scrub catalog 6044
Good morning, I uploaded a first instance of the ACMI Catalogue to #6044, but found that there was a formatting issue with the identifier. I have attempted to rectify by reimporting, but it just aggregates the data - could someone with the required permissions scrub it back to a fresh start and I can reimport the corrected version. Thank you. Pxxlhxslxn (talk) 23:34, 19 October 2023 (UTC)
Some catalogues with strange results
Mix'n'match is very helpful, and I especially like the random candidates https://mix-n-match.toolforge.org/#/creation_candidates/random_prop However, there are some problems with that that pop up all the time:
- https://mix-n-match.toolforge.org/#/catalog/4365 SHARE catalogue nearly always gives a dot at the end of the (unabridged) given name in the label feld, which is a bit annoying since it makes a lot of work necessary to clean up after creation of the item.
- https://mix-n-match.toolforge.org/#/catalog/2050 VIAF Selected is very useful, but it gives wrong birth dates. Anyone born "in the 20th century" without a specific year is thought to be born in 1950 by that catalogue, and therefore I guess we must have hundreds of wrong "1950" birth entries. They should be all checked somehow, I think. (Surely this also applies to some other unspecific dates.)
- https://mix-n-match.toolforge.org/#/catalog/3849 NUKAT always fills the description field for Polish just with some numbers (VIAF etc.), which is not useful and needs to cleaned up after every creation of an item with NUKAT entry.
Maybe someone how knows how to fix this kind of issues can do something about it. Thank you --Anvilaquarius (talk) 15:02, 26 October 2023 (UTC)
Request to clear catalog #219
Catalog #219, "IAAF ID (track and field athletes)", is no longer useful because its data source only populated with old-format deprecated IDs, with length under 6 characters. Right now they redirect to new-format IDs (length greater than 7 chars), but that might change in the future, and it would be counter productive to fill Wikidata with old format IDs that are no longer reachable via the web (except as a redirect). Also, in 2019 the IAAF changed their name to "World Athletics" so the catalog title should reflect that. --Habst (talk) 02:46, 30 October 2023 (UTC)
- Not sure who best to contact about this, but @Magnus Manske @Lucas Werkmeister, might you be able to help? --Habst (talk) 22:20, 2 July 2024 (UTC)
- I don’t think I can do anything here, sorry. Lucas Werkmeister (talk) 18:28, 3 July 2024 (UTC)