Jump to content

User:A ka es/OpenRefine/wikimania2019 postersession

From Meta, a Wikimedia project coordination wiki

Poster Session at #wikimania2019 - empower yourself: first steps[edit]

Description File
* Wikimania 2019 - Poster Session
* The Magic of OpenRefine
The Real Magic of OpenRefine

Installation[edit]

Description Screenshot
Sources: Linux kit, Mac kit, Windows kit

Documentation for users, Installation Instructions:
"... it runs as a small web server on your own computer and you point your web browser at that web server in order to use Refine. So, think of Refine as a personal and private web application." Installation Instructions
start desktop

Acquiring Data[edit]

stored at your own computer[edit]

Source for data examples: (the-nerd.be)

Notes: You can open and upload more than one file at the same time: choose more than one (it is easier if the files are in the same file directory at your computer). This is a good process if the data structure in the files is equal.

"flat" data formats like .csv, .tsv, .xls, .xlsx, .odt[edit]

Description Screencast
* Startsite OpenRefine
* column left: select "Create Project"
* select "Get data from - This Computer"
* main column: push "Browse..."-button
* choose the file from your local directory
* push "Next"
* process: uploading data => preview
* choose the data format (below the columns on the left side; mostly it is detected automatically)
* check the options below the columns, try out the best combination, update the preview
* if everything fits: name the project, set a tag (fields above the columns)
* push the "Create Project"-button on the right side)
import "flat" data formats

structured data formats like .xml, .json[edit]

Description Screencast
* Startsite OpenRefine
* column left: select "Create Project"
* select "Get data from - This Computer"
* main column: push "Browse..."-button
* choose the file from your local directory
* push "Next"
* specify the data path in the preview window (hover at the curly brackets and choose per click, if all needed data are included)
* check the preview - if you miss something push the "Please specify a record path first"-button and start again
* if everything fits: name the project, set a tag (fields above the columns)
* push the "Create Project"-button on the right side)
import .json
import .xml

special case .html[edit]

Description
* open the .html file in a browser
* copy the table-structure
* paste it in the clipboard

(see the next section)

copy & paste from tables[edit]

Description Screencast
* copy a table structure from a source (website, .pdf-file, textfile, spreadsheet e.g.)
* Startsite OpenRefine
* column left: select "Create Project"
* select "Get Data from - Clipboard"
* paste the copied table structure in the clipboard window
* push the "Next"-button below
* process: uploading data => preview
* choose the data format (below the columns on the left side; mostly it is detected automatically)
* check the options below the columns, try out the best combination, update the preview
* if everything fits: name the project, set a tag (fields above the columns)
* push the "Create Project"-button on the right side)
copy & paste from tables

load data via API or URL[edit]

Source for data examples: abgeordnetenwatch.de API parliaments

Notes: You can request more then one URL at the same time: push the "Add Another URL"-button and the next URL. If all URLs are in, push the "Next"-button. This is an good process if you are sure, that the data structure behind the URL is equal.

Description Screencast
* Startsite OpenRefine
* column left: select "Create Project"
* select "Get data from - Web Addresses (URLs)"
* paste or write the URL in the field
* push "Next"
* next step depends from the data format: select a data path or check options
* if everything fits: name the project, set a tag (fields above the columns)
* push the "Create Project"-button on the right side)
import from a single URL
import from more URLs at the same time

Exploring Data[edit]

Description Screencasts
If you have a data project in OpenRefine you can explore and edit the content in many ways; the easiest are facets and filter.
facets
editing directly in facets
filter
You can cluster values to find failures and to correct them.
clustering values

Preparing Data[edit]

Description Screencast
The file in the example came with the following note:
"Brussels phone numbers start with +32(0)228 45; change the 5 to 9 for the fax.
Strasbourg phone numbers start with +33(0)388 1 75; again, change the 5 to 9 for the fax."

We have to create the fax numbers and we have to delete the "@" in the Twitter user name.
enriching and changing data

Combining Data[edit]

Description Screencast
There are two OpenRefine-projects: the file from the European Parliament, enriched with the Q-Numbers for the MEPs, and a wikidata query.
We want to combine both to know, which MEPs have an parliamentery term-entry in wikidata and where are the gaps.
We use the Q-Numbers as key.
combining two projects

Exporting Data[edit]

Description Screencast
You can export your data with one click to many formats: as an OpenRefine-project to share with others, as common spreadsheet-formats or csv/tsv, as html-file. Or you can make your own choice of columns with an exporter.
data export

"Magic" (Bonus)[edit]

regex[edit]

Description Screencast
first impressions ... (content and screencast are coming soon)

GREL[edit]

Description Screencast
first impressions ... (content and screencast are coming soon)

reconcilation services[edit]

Description Screencast
first impressions ... (content and screencast are coming soon)

"about" section => editing meta data[edit]

Description Screencast
If you work with many projects using the meta data and tags to organize them is very useful. If you missed the function: you can do this in the "about" section for every project.
organizing OR-projects