Jump to content

Wikimaps proposal for maps georeferencing architecture in Wikimedia projects

From Meta, a Wikimedia project coordination wiki

Specification for a data model to store georectification data

[edit]

Background

[edit]

georeferencing (georectification, warping) is a type of coordinate transformation, a process that aligns scanned maps with a spatial reference system, allowing the map image to be displayed as a tiled web map. The georeferencing is done by finding pairs of ground control points (gcp's) on the scanned (raster) map and coordinate points in a digital map or aerial image (that is already georeferenced). With this information, the georeferencing algorithm distorts (warps, rectifies) the raster map to match the spatial reference system's geometry.

Wikimaps Warper is a georeferencing app that georeferences old maps. It was adapted for the Wikimedia environment based on MapWarper, originally created for the New York Public Library.

Other software that do similar operations are Klokan technologies' Georeferencer, used by the British Library maps, and Allmaps. Desktop GIS software such as QGIS also has georeferencing capability.

What is this proposal about?

[edit]

Wikimaps Warper stores data in its own database, and this data, as well as the data produced by other georeferencing tools, could be available for developers of more lightweight tools, if stored in the Wikimedia projects. However, the community has not yet reached a consensus about the format and scope of the data.

Proposed features

[edit]

Create a combined Wikidata property and a single dataset instead of 3 separate properties and datasets

[edit]

Description

[edit]

The old proposal suggested to create 3 Wikidata properties to be used to hold all georeferencing data. The values would be Wikimedia Commons tabular data ie. data table files, that are JSON files under the hood.

  1. Dataset for the georeferencing control point data. This would include control point pairs for the scanned map and the coordinate system.
  2. Dataset for the georeferencing mask geoshape. This would include information about the bounding box coordinate points of the coverage of scanned map in the coordinate system.
  3. Dataset for the georeferencing pixel mask data. This would include the coordinates of points on the raster map image, representing a mask that covers the map sheet beyond the map image.

Argumentation & open questions

[edit]

Some of the following concerns should be moved to the discussion about the GeoJSON format after discussion about creating a single Wikidata property referencing a GeoJSON has been agreed on.

🌟 Multichill proposes instead a combined map data (GeoJSON) file on Wikimedia Commons with distinct features for gcp, mask, and pixelmask.[1] His proposal and example.

🌟 Bert Spaan notes that GeoJSON only supports WGS 84, while the GDAL transformation can use any coordinate reference system (projection).[2]

💬 TuukkaH suggests that we still can support additional coordinate systems if we specify so in our spec: However, where all involved parties have a prior arrangement, alternative coordinate reference systems can be used without risk of data being misinterpreted (GeoJSON spec).[3]

🌟 Bert Spaan further notes that A GeoJSON polygon may potentially contain holes, while a georeferencing mask does not contain holes and georeferencing software does not support holes [- typically?].[2]

🌟 Would the single file approach cover the case when a scanned map can consist of multiple maps (e.g. map sheets, inset maps)?

💬 Bert: This could be done by allowing multiple georectified maps in a single JSON maps object (refers to the original JSON Schema proposal).[2]

💬 Jheald notes that qualifiers may be needed to express complex cases, eg. with multiple sets of data. [1]

💬 Multichill counter-proposes that any single FeatureCollection can only use a single image mask and set of control points.[4]

💬 TuukkaH proposes solutions to cater for multiple maps per image:

  1. We can crop and split the original image into multiple source images.
  2. We can link multiple GeoJSON files to a single source image.
  3. We can allow multiple FeatureCollections in one GeoJSON file (with a top-level FeatureCollection wrapping the others).

💬 In IIIF Georeference Extension the issue is resolved so that a resource can be georeferenced by using multiple Georeference Annotations, each with their own SVG Selector and GCPs.[5]

💬 Susannaanas notes that information about the chosen principle is needed for the constraints in the Wikidata property.

Proposed conclusion

[edit]

Add here the updated Wikidata property proposal, or the information for creating it. All properties are debatable until agreed on.

Name of the property Georeferencing data
Description Format for rectifying images, specifically maps. The format is backwards compatible with GeoJSON.
Represents
Data type geo-shape
Domain Commons image
Allowed values Data:.*\.map
Example 1 Data:Georectification.example2.geojson.map Multichill's original proposal
Example 2
Example 3
Source Wikimaps Warper, external tools and sites such as NYPL MapWarper, British Library Georeferencer, David Rumsey maps, other sites with georeferencing. User input and upload manually and through batch upload and edit tools.
Planned use Transferring data from Wikimaps Warper and external tools to Wikimedia projects. Make the data available for microservices on Wikimedia projects.
See also Wikidata:Property_proposal/external_georeferencer_URL

Terminology

[edit]

Proposals of using specific language in the Wikimedia proposal

Agreed Alternatives
georeferencing, georectifying, georectification, warping, georegistration
imagemask, mask, crop, cutline

Relation to Allmaps and IIIF

[edit]

🌟 Jheald suggests it would be nice if we had a WMF-maintained IIIF service that supported tiles — not sure whether the current Commons offering supports tiles or not, & whether Allmaps can be made to work with a map from Commons.

💬 Susannaanas notes that WMF plans to support a IIIF service are not continued. Abbe98 notes that Allmaps approach of rendering/warping clientside is more important to wikimedia use than its usage of IIIF.

Additional metadata not included in the scope of this proposal

[edit]

🌟 Jheald makes a note that

  • The georeferencing apps may hold side-data we might want to store alongside with the GCPs — eg additional things that MapWarper stores.
  • The pixel dimension of a Wikimedia Commons image may change, and then related image coordinates may change as well. For this reason it would be useful to make sure that GCPs relate to the same revision of a file (or at least one with the same dimensions) as the version being served.

🌟 TuukkaH notes a concern with GeoJson: where to put extra metadata. He proposes to simply extend the (top-level) FeatureCollection with these metadata fields as GeoJSON "foreign members"?

Proposed GeoJSON schema

[edit]

Description

[edit]

🌟 Bert Spaan notes that this schema could made using JSON Schema.

🌟 Original proposal by the team at Wikimania 2019 hackathon.[6]

🌟 Initial text for the Specification drafted by The DJ:

A GeoRectify GeoJSON is format for image, specifically map rectifying, information that is backwards compatible with GeoJSON. As such the geographic parts of it can be displayed with any GeoJSON tool. A GeoRectify GeoJSON is specified as:

  1. At least one FeatureCollection
  2. This FeatureCollection has at least one Polygon feature and 0 or more Point features.
  3. The Polygon feature describes the image that has been rectified.
    1. The properties of the Polygon feature must contain a file attribute and a transformation attribute. It MAY contain the attributes: sha1, commons_entity
      1. type: ImageMask (do we need this?)
      2. file: simple filename of the image file belonging to this georectification ? or should this be a url ?
      3. transformation: We need to get clear why we need this info and what it means exactly
      4. sha1: sha1sum of the file. Use this to make sure that the file for which this georectify information was authored is exactly the same as one used later on. If later sha1 becomes so broken that another format is needed, add "sha256" as attribute.
      5. commons_entity: Commons data media entity
      6. unit: pixel, The default unit is in pixels of the image, before rectification. alternative units are currently not supported.
    2. There is an array attribute name "cutline". This array has, for each coordinate in the polygon geometry, a corresponding point (array of 2 numbers) in the image. This defines which parts of the image are cropped from the image ??????
  4. The Points describe the Geo Control points of the rectification.
  5. Each point has a geometry coordinate. For each point there is a corresponding point in the images, specified in an attribute named "controlpoint" which is an array of 2 numbers.

🌟 TheDJ suggests adding 'sha1' or 'sha256' field to pixelmask, to identify and track the exact version of an image that the mapping was made from.[1]

🌟 Multichill proposes (and TheDJ agrees) that the mask and the pixel mask can be merged into a single Feature in the GeoJSON format.

🌟 How to deal with image proportions when making control points and mask points to the raster map? Use pixels or relative values to document extents?

Proposed conclusion

[edit]

Prepare the proposed schema as JSON Schema! Something like this, but with up-to-date content. The initial schema is created with ChatGPT from Multichill's example. :

{
  "$schema": "http://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "properties": {
    "license": {
      "type": "string"
    },
    "description": {
      "type": "object",
      "properties": {
        "en": {
          "type": "string"
        }
      },
      "required": ["en"]
    },
    "sources": {
      "type": "string"
    },
    "zoom": {
      "type": "integer"
    },
    "latitude": {
      "type": "number"
    },
    "longitude": {
      "type": "number"
    },
    "data": {
      "type": "object",
      "properties": {
        "type": {
          "type": "string",
          "enum": ["FeatureCollection"]
        },
        "features": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "type": {
                "type": "string",
                "enum": ["Feature"]
              },
              "properties": {
                "type": "object",
                "properties": {
                  "type": {
                    "type": "string",
                    "enum": ["Imagemask"]
                  },
                  "file": {
                    "type": "string"
                  },
                  "commons_entity": {
                    "type": "string"
                  },
                  "sha1": {
                    "type": "string"
                  },
                  "sha256": {
                    "type": ["string", "null"]
                  },
                  "transformation": {
                    "type": "object",
                    "properties": {
                      "type": {
                        "type": "string",
                        "enum": ["affine"]
                      }
                    },
                    "required": ["type"]
                  }
                },
                "required": ["type", "file", "commons_entity", "sha1", "transformation"]
              },
              "geometry": {
                "type": "object",
                "properties": {
                  "type": {
                    "type": "string",
                    "enum": ["Polygon", "Point"]
                  },
                  "coordinates": {
                    "type": "array",
                    "items": {
                      "type": "array",
                      "items": {
                        "type": "array",
                        "items": {
                          "type": "number"
                        },
                        "minItems": 2,
                        "maxItems": 2
                      }
                    }
                  }
                },
                "required": ["type", "coordinates"]
              },
              "cutline": {
                "type": ["array", "null"],
                "items": {
                  "type": "array",
                  "items": {
                    "type": "number"
                  },
                  "minItems": 2,
                  "maxItems": 2
                }
              },
              "controlpoint": {
                "type": ["array", "null"],
                "items": {
                  "type": "integer"
                },
                "minItems": 2,
                "maxItems": 2
              }
            },
            "required": ["type", "geometry"]
          }
        }
      },
      "required": ["type", "features"]
    }
  },
  "required": ["license", "description", "sources", "zoom", "latitude", "longitude", "data"]
}


[edit]

Status & version

[edit]

Working document / Draft / Under discussion / Proposed to...

Invite to comment and contribute

[edit]

Contributors

[edit]

Watching

[edit]

References

[edit]
  1. a b c "Wikidata:Property proposal/georeferencing data - Wikidata". www.wikidata.org. Retrieved 2024-09-01. 
  2. a b c Spaan, Bert (2019-07-01). "Proposal for Wikimania 2019 Hackathon". Observable. Retrieved 2024-09-01. 
  3. Butler, H.; Daly, M.; Doyle, A.; Gillies, Sean; Schaub, T.; Hagen, Stefan (2016-08-01). "The GeoJSON Format". 
  4. "User talk:Multichill/Map warping format - Wikimedia Commons". commons.wikimedia.org. Retrieved 2024-09-01. 
  5. "Georeference Extension". iiif.io. Retrieved 2024-09-02. 
  6. bertspaan/georectify-json-spec, 2019-11-19, retrieved 2024-09-02