Jump to content

User:Skierpage/type handling

From Meta, a Wikimedia project coordination wiki

Desired

[edit]

An article about Alpha Centauri says its Radius is [[Attribute:AstronomyRadius:=0.12 Ms]].

The tooltip will say xxx parsecs, nnnnnn m.

The factbox will say xxx parsecs (0.12 Ms, nnnn m).

Whatever value the user entered is preserved in text, has a tooltip listing other values, and in the infobox presents the value in preferred units for the attribute, followed by the value in other units, followed by the value in SI unit.

This is all how it works in SMW 0.3; the only difference is now the attribute controls the preferred units, overriding the type (if an attribute doesn't say, the standard unit will be SI).

The trick is to continue storing the attribute in smw_attributes using the primaryUnit (SI). Even though you're displaying 22046 carats for Gem_mass, you don't want to store

Cullinan_Diamond  	Gem_mass  	carats  	mass  	22046.2262185  	22046.2262185

you still want to store in kg.


Non-issues

[edit]

We don't need to over-design. Note there are lots of units in WikiPedia articles that are not used to express facts about the article. E.g. articles say microscopes can resolve to the Ångström level, or bonds are a few Ångströms, but I can't find an article that has a fact about itself specified in Ångströms.

Design

[edit]

Preferred Units in attribute

[edit]

I think a two-level system.

There are the fundamental SI units, e.g. m for length. These datatypes live in the Type: namespace as now. But there's only one of these for each unit, rather than Type:Geographic length

Then there are the customary units for certain domains, e.g. km for geographical lengths and nm for colors like w:red. These are easy power of 10 conversions. But then there are difficult lengths like parsecs, AU's, and light years, see http://en.wikipedia.org/wiki/Alpha_Centauri

Rather than have a fundamental datatype for each kind of length, each attribute should be able to specify its main unit ('STDUNIT' in the code) and a list of common units (the keys of the 'VALUES'array in the code) for the tooltip and infobox. The implementation of the fundamental datatype can convert any length to these. By convention the STDUNIT is the first in the list of common units.

Not sure how to express this list of preferred unit formats on an Attribute: page. As a template? A magic attribute like [[UnitList:=ha,km²,mi²]] ?? Yet another namespace, Format:, pages? Should this live in the regular smw_attributes table or the smw_specialprops table?
I implemented this in May 2006 as an additional parsing of Attribute: pages for a UnitList attribute that is stored in smw_specialprops.
Markus Krötzsch commented 2006-05-08

"The prefered units for this attribute are [[displays main unit:=km]], [[displays unit:=ha]], and [[displays unit:=mi]]."

In short: prefer many simple statements over a new complex "unit list" type, which again needs internationalisation etc. Very specific names for built-in attributes are preferred since this makes clashes with user-defined attributes less likely.

and indeed that's a better system.

Format information

[edit]

Preferred units still isn't enough. You want to be able to specify a format, especially for times.

Could possibly use strftime format string for date/time, or sprintf format string (which I think implies a number of places, certainly for decimal).

This could go in the attribute page along with the unit list [[UnitList:="%5.2e" ha,"%2.3f" km²,"%2.4f" mi²"]] ??

Precision information

[edit]

Cheetahs run at speeds up to speed:=60 mph gets converted to speed: 96.56064 km/h. This is misleading, it indicates too much.

Users have proposed being able to specify precision, e.g. 60 mph +/-2 or +/- 5%. That would aid in appropriate rounding after conversion, thus speed: 97 km/h.

Could possibly use strftime format string for date/time, or sprintf format string (which I think implies a number of places, certainly for decimal).

Should this information go into

[edit]

The most complete implementation of unit conversions that I've found is in the Perl Data-Dimensions module, it implements some of the advanced ideas that user:movGP0 mentions.

w:User:Egil/Sandbox/units lists a ton of obscure units.

Specific Types and Attributes

[edit]

When exporting types in RDF format, we use XSD Schema Datatypes from http://www.w3.org/TR/xmlschema-2/ I think we also want to import RDF, so we should be able to read XSD Schema Datatypes. Can anyone confirm this, parsing some XSD stuff is hard!


Issues

[edit]

Also see limitations of current implelementation

  • Apart from Americans with their w:Imperial units, everyone else is happy with the w:metric system. So do we even bother with the lists of formats? (Yes)
  • Do users get to pick their preferred formats?
  • Does smw_attributes store values converted down to fundamental SI units, or to preferred units?
    • Can databases handle vast ranges (scientific notation?)
  • Maybe group attributes (with /?) so they can share layouts.


Notes on Current Implementation

[edit]

Max Völkel sums it up nicely in http://wiki.ontoworld.org/wiki/Talk:Built_in_Types

  • Attribute:Length says it has type Geographical Length. That's backward. The fundamental type is length, and attributes like Geographical Length can say they prefer a particular unit.
  • In the ConvertXxx() functions, all the 'UNIT' specifying is useless. The only UNITs used are the keys in the array returned by processValue().
  • If math is slow, enhancement would be to remember the incoming unit, to avoid reconverting on output. Also avoids rounding errors, though I haven't seen any.

Limitations

[edit]
  • Doesn't support scientific notation, like 2.997 E8 m/s
I have the necessary update to SMW_DT_Float.php, though it's complicated (slow?) regexp
  • You can't make the units into a link. This is really common usage!
Actually, I think [[mass:=6e1 kg|60]] [[kg]] works but is kludgy. Better if the vertical bar would trim units.?!
  • Doesn't handle SI prefixes like m,k,M, etc.]
  • Incomplete localization, e.g. SMWConvertGeographicArea has kilometres. More precisely, it is globalized for some languages but not localized.</nowiki>
  • Search is just on strings, so no ranges.
Denny added value_num for numeric storage.
an ASCII comparison of XSD dateTimes and XSD durations will work
    • but it does seem to attempt unit conversion. It converts results on output.

TODO

[edit]

mass: Daltons

1 u = 1/NA gram = 1/(1000 NA) kg   (where NA is Avogadro's number)
1 u ≈ 1.66053886 × 10−27 kg ≈ 931.49 MeV/c2

length: light years, parsecs, etc.

See Also

[edit]

http://en.wikipedia.org/wiki/Imperial_units