Connected Open Heritage/Wikidata migration/Documentation
Appearance
DATA EXPLORATION
[edit]- Set up milestone on Phabricator under Connected-Open-Heritage-Wikidata-migration, using the name of the database table, such as
se-arbetsl
. - Set up page under d:Wikidata:WikiProject_WLM/Mapping_tables
- Fill it out with sample data.
- Note: As of now, these are all created and filled out thanks to this script. It only needs to be rerun if a new table is added to the WLM db.
- Look at the unique identifier of each item. Does it correspond to an identifier in an external source?
- If yes, find or request an appropriate property.
- If no (i.e. the ID is just for internal WLM use), this might mean the dataset is not suitable for import. Without a real-world reference, we can't tell much about the completeness or selection criteria of the data.
- Identify heritage status. Do all the items represent the same type of heritage protection (eg. national monument in <country>)?
- If not, how can the heritage status of each item be inferred?
- Create or edit any necessary items, for example cultural monument of the Czech Republic (Q385405). It should at least have assigned country and subclass of cultural property / national heritage site.
- Identify
P31
- A default
P31
for all the items -- something basic like building or ancient monument. - Sometimes there's a separate column for this, like type, that can be used to substitute the default one if possible.
- A default
- Create necessary lookup tables.
- Some fields have a limited range of distinct values, for example se-fornmin_(sv)/types.
- In SQL, you can check it using
select distinct(columnname) from tablename;
- The script for this is here.
- Focus on mapping the most common ones first
- Identify and download any necessary offline data.
- This is to avoid doing live queries while running the program, which takes a lot of time.
- Usually stuff like placenames, administrative units.
- Data that does not change often.
- Identify areas that can benefit from community input.
- Problematic due to language.
- Problematic due to lack of factual knowledge.
- Labels and descriptions
- Can the
name
column be used as-is for label? - Descriptions can be made using the default P31/heritage and country/administrative location
- Descriptions in extra languages, apart from the language of the dataset?
- Can the
CODING
[edit]- Create a basic mapping file like this one.
- Contains data that apply to all the items.
- If possible, use a unique property (for ID number) that will be used in addition to monument_article to see whether an item might already exist.
- Create statements for all relevant columns.
- All statements have a source -- see phab:T155241.
UPLOADING
[edit]- Create page with preview of processed data.
- Example: se-ship_(sv)/preview
- Request for permission
- Link to preview
- Describe how data is processed.
- Describe how already existing items are detected.
- Test upload of ~10 items.
- Upload of dataset.